Making the Genome Huge: The Case of Triatoma delpontei, a Triatominae Species with More than 50% of Its Genome Full of Satellite DNA

The genome of Triatoma delpontei Romaña & Abalos 1947 is the largest within Heteroptera, approximately two to three times greater than other evaluated Heteroptera genomes. Here, the repetitive fraction of the genome was determined and compared with its sister species Triatoma infestans Klug 1834, in order to shed light on the karyotypic and genomic evolution of these species. The T. delpontei repeatome analysis showed that the most abundant component in its genome is satellite DNA, which makes up more than half of the genome. The T. delpontei satellitome includes 160 satellite DNA families, most of them also present in T. infestans. In both species, only a few satellite DNA families are overrepresented on the genome. These families are the building blocks of the C-heterochromatic regions. Two of these satellite DNA families that form the heterochromatin are the same in both species. However, there are satellite DNA families highly amplified in the heterochromatin of one species that in the other species are in low abundance and located in the euchromatin. Therefore, the present results depicted the great impact of the satellite DNA sequences in the evolution of Triatominae genomes. Within this scenario, satellitome determination and analysis led to a hypothesis that explains how satDNA sequences have grown on T. delpontei to reach its huge genome size within true bugs.


Introduction
The subfamily Triatominae (Hemiptera: Heteroptera: Reduviidae) comprises a group of more than 150 blood-sucking species distributed in 18 genera, with Triatoma Laporte, 1832 being by far the genus with the largest number of species, 82 species [1,2]. These insects act as Chagas disease vectors, which is the most serious human parasitic disease in Latin America, affecting 6-7 million people worldwide [3]. In the absence of vaccines or adequate drugs for large-scale treatment, the reduction in disease burden critically depends on the control of transmission by triatomine vectors [4]. Then, to ensure a successful control campaign, knowledge about these insects' genetics is an extremely important factor.
In the Southern Cone of South America, the main vector species is T. infestans, with a great capability to colonize human dwellings [5]. It is a very polymorphic species regarding ecological and genetic traits. T. infestans presents two chromosomal groups, Andean and non-Andean [6,7], which were later supported with nuclear and mitochondrial DNA sequences [8,9] as well as cuticle hydrocarbon patterns [10]. In contrast, the evolutionaryrelated species T. delpontei was much less studied. T. delpontei inhabits a relatively small fluorescence in situ hybridization (FISH) were located in the euchromatic regions, and the hybridization pattern resembles those seen in GISH assays. Bearing in mind that GISH analysis revealed a similarity between the overall repetitive content of T. infestans and T. delpontei, it is probable that at least some satDNA families were shared between both species. In extension, the variation in the genome DNA content between T. infestans and T. delpontei could be explained by the expansion of similar, or even more, satDNA families in the heterochromatin. To address these ideas, the T. delpontei repeatome was determined, with a special focus on its satellitome. In addition, a deeper analysis of the T. infestans satellitome was performed in order to better compare it with T. delpontei.

Samples and DNA Extraction
Individuals of T. delpontei were collected in the Rivadavia department from Salta province, Argentina. Genomic DNA was extracted from one male, as in this species, males are the heterogametic sex, in order to retrieve information on all the chromosomes, including the X and Y. For DNA extraction, the head and leg muscles were used using a commercial kit (Gentra Puregene Qiagen, Hilden, Germany) according to the manufacturer's guidelines. For chromosome preparations, males from several localities from Argentina and Uruguay were dissected, and the testes were preserved in ethanol: glacial acetic acid (3:1) and stored at −20 • C.

Genome Sequencing and Graph-Based Clustering of Sequencing Reads
Genomic DNA was sequenced using the Illumina HiSeq 2500 platform at Macrogen. The low-coverage sequencing yielded around 2.2 Gb of 101 bp paired-end reads. Raw reads data were quality trimmed, and adaptors were removed using Trimmomatic (v.0.36) [54]. Fastq files were modified to fasta using seqtk version 1.3-r106 (https://github.com/lh3 /seqtk, accessed on 5 September 2022). The first step in the repeatome characterization was to run the RepeatExplorer2 pipeline [53,55], including the TAREAN analysis within the Galaxy portal (https://repeatexplorer-elixir.cerit-sc.cz, accessed on 6 September 2022). For this analysis, we randomly selected a total of 16,000,000 paired reads. Default options were selected except the computing time (extra-long), the filtration of the most abundant repeats and the threshold of the analysis to 0.001%. The repeatome annotation was performed in those clusters that represented more than 0.001% of the genome (top clusters). In the analysis, a custom database with the satDNA families of T. infestans [26] and Rhodnius prolixus Stål, 1859 [31] was used in order to test the presence of these satDNAs in T. delpontei genome. This was also aided by a custom database containing TEs of T. infestans and R. prolixus. Cluster reads with similarity with the T. infestans satDNAs and those identified as satDNAs by TAREAN were deeply analyzed. Other clusters with satDNA-compatible graphs were also analyzed. For each cluster, the contig with the highest number of reads was used to produce a dot plot with Geneious Pro v.4.8.5. (https://www.geneious.com, accessed on 23 November 2022) in order to determine the existence of tandem repeats and to estimate the size of the repeat unit. All reads containing satDNA sequences in the clusters were aligned using Geneious Pro v.4.8.5. and then a consensus sequence from each family was created. Once all consensus sequences were retrieved, a BLAST "all-to-all" was performed using blast and -e value of 0.001 in order to find similarities among the satDNA families. Moreover, abundance and divergence of each satDNA family were calculated using the RepeatMasker tool (http://www.repeatmasker.org, accessed on 9 December 2022), keeping the alignment ("-a" option) and using the RMBlast search engine. A total of 5,000,000 randomly selected reads were used for this analysis, and then aligned back to the total collection of satDNA dimers (for those families with a repetition unit length larger than 100 bp) or a concatenation of at least 200 bp for those families with a repeat length smaller than 100 bp. Satellite DNA landscapes (abundance vs. divergence) were created by applying the Kimura 2-parameter (K-2p) model using the perl script  [56], with the ggplot2 package [57].
For a deeper comparison with the sister species T. infestans, a re-analysis of its satellitome was performed using the available data from Pita et al. [26] but applying the same parameters used here for T. delpontei. In the new RepeatExplorer2 analysis, a custom database with all the satDNA families previously described in T. infestans [26], together with all satDNAs found in T. delpontei, was used.
For comparative analysis between species, the profile of each satDNA family was generated in both species over raw paired-end reads, using the RepeatProfiler tool [58]. This tool is used to create the coverage and base pair composition profile using the data from Illumina sequencing. As reference, we concatenated monomers into trimers (or at least monomers up to 200 bp). The default options were selected, with the "-p" option in order to take as input the paired-end reads [58].

Chromosome Preparation and Physical Mapping by Fluorescence In Situ Hybridization
Chromosome plates were obtained from fixed testes of T. delpontei, as described in Pita et al. [26]. The consensus sequences from the selected satDNA families were used in order to design a set of oligonucleotides that were biotin-16-dUTP (Roche, Manheim, Germany) labeled using terminal transferase (Roche). Labeled oligonucleotides (Table 1) were used as probes for FISH (final concentration of 5 ng/mL in 50% formamide) [26,59]. The fluorescent immunological detection was performed using the avidin-FITC/anti-avidinbiotin system with two amplification rounds. After that, the slides were counterstained using Vectashield-DAPI (Vector Laboratories, Burlingame, CA, USA). Images were taken with a BX51 Olympus ® fluorescence microscope (Olympus, Hamburg, Germany) equipped with a CCD camera (Olympus ® DP70) and processed using Adobe ® Photoshop ® software (Adobe Systems, San Jose, CA, USA).

Results
Low-coverage sequencing of the T. delpontei genome was performed, obtaining 8 M paired-end 101 bp reads after quality filtering and trimming the pipeline. RepeatExplorer2 clustering analysis retrieved 1047 clusters, with at least 20 reads within them. Clusters identified as mitochondrial DNA sequences were left aside from the final annotation. Highly repetitive or top clusters corresponded to 59.64% of the total genome. Cluster annotation led to the classification into five categories: long terminal repeats (LTR), nonlong terminal repeats (non-LTR), class II elements, satellite DNA (satDNA) and ribosomal DNA (rDNA). Those unable to determine their identities were labeled as undetermined repeat (unclassified) ( Figure 1A). In order of abundance, the most frequent category was satDNA (51.07%), followed by class II elements (1.81%), LTR (1.42%), non-LTR (including Penelope elements to simplify 1.21%) and rDNA (0.02%). Comparative analysis with T. infestans (Andean and non-Andean lineages) showed that the repeatome of all taxa is built with satDNA as the major repeat fraction, being these sequences responsible for the differentiation in genome size between species ( Figure 1B).
identified as mitochondrial DNA sequences were left aside from the final annotation. Highly repetitive or top clusters corresponded to 59.64% of the total genome. Cluster annotation led to the classification into five categories: long terminal repeats (LTR), non-long terminal repeats (non-LTR), class II elements, satellite DNA (satDNA) and ribosomal DNA (rDNA). Those unable to determine their identities were labeled as undetermined repeat (unclassified) ( Figure 1A). In order of abundance, the most frequent category was satDNA (51.07%), followed by class II elements (1.81%), LTR (1.42%), non-LTR (including Penelope elements to simplify 1.21%) and rDNA (0.02%). Comparative analysis with T. infestans (Andean and non-Andean lineages) showed that the repeatome of all taxa is built with satDNA as the major repeat fraction, being these sequences responsible for the differentiation in genome size between species ( Figure 1B). infestans data were taken from Pita et al. [26].
Due to the great amount of satDNA, special emphasis was placed on the satellitome determination. A deeper search on clusters led to the identification of 160 satDNA families in the T. delpontei genome ( Table 2, Supplementary Table S1). Satellite DNA consensus sequences were deposited in the NCBI database (Acc. Numbers OQ82080-OQ82236). The satDNA families were named following Ruiz-Ruano et al. [45], numbered according to their abundance in the genome, and including the length of the repeat sequence (Tdel-Sat01-79 to TdelSat160-49). Monomeric repeat length in T. delpontei showed variation from 4 bp (TdelSat13-4-CATA) to 1000 bp (TdelSat05-1000), with most of them having less than 200 bp (Supplementary Table S1). The A+T content in the consensus satDNA families ranged between 41% (TdelSat59-63) and 86% (TdelSat111-7), and the A+T richness in most of them was higher than 50%. Comparative analysis among consensus sequences from all satDNAs families revealed that most of the satDNA families did not show similarities among them. However, between several satDNAs, it was possible to find the existence of regions with high similarity, suggesting that they are evolutionarily related (Supplementary Figure S1). Most of the similarities were found when comparing TdelSat05-1000, which has conserved regions with the other 11 satDNA families. Due to the great amount of satDNA, special emphasis was placed on the satellitome determination. A deeper search on clusters led to the identification of 160 satDNA families in the T. delpontei genome ( Table 2, Supplementary Table S1). Satellite DNA consensus sequences were deposited in the NCBI database (Acc. Numbers OQ82080-OQ82236). The satDNA families were named following Ruiz-Ruano et al. [45], numbered according to their abundance in the genome, and including the length of the repeat sequence (TdelSat01-79 to TdelSat160-49). Monomeric repeat length in T. delpontei showed variation from 4 bp (TdelSat13-4-CATA) to 1000 bp (TdelSat05-1000), with most of them having less than 200 bp (Supplementary Table S1). The A+T content in the consensus satDNA families ranged between 41% (TdelSat59-63) and 86% (TdelSat111-7), and the A+T richness in most of them was higher than 50%. Comparative analysis among consensus sequences from all satDNAs families revealed that most of the satDNA families did not show similarities among them. However, between several satDNAs, it was possible to find the existence of regions with high similarity, suggesting that they are evolutionarily related (Supplementary Figure S1). Most of the similarities were found when comparing TdelSat05-1000, which has conserved regions with the other 11 satDNA families. Table 2. Data of the satDNA families found in T. delpontei: name, genome abundance in T. delpontei, presence in T. infestans, and genome abundance in the Andean and non-Andean T. infestans lineages.

Genome Abundance in T. infestans (%)
Andean Non-Andean   The deeper T. infestans satellitome analysis was able to characterize 54 new families (Acc. Numbers OQ82237-OQ82289), together with the 42 previously already described [26], resulting in a total of 96 satDNA families in the T. infestans genome (Supplementary Table  S2). In addition, the amount of all satDNAs in T. infestans Andean and non-Andean lineages were now quantified using RepeatMasker. Previously, the proportion of each satellite was calculated based on the number of reads in the clusters obtained with RepeatExplorer [26]. The new satDNAs were numbered according to their abundance in the Andean lineage, according to the RepeatMasker results. The new analysis showed some differences in the estimation for the previously identified 42 satDNAs, but we have kept the previous names in order to avoid pitfalls. Most of the 96 satDNAs were present in both lineages (Supplementary Table S2). Two satDNAs were detected only in the Andean lineage, and three families only in the non-Andean lineage. The use of RepeatMasker allowed the determination that 51 of these 96 T. infestans satDNAs were also present in T. delpontei (Table 2), although their abundance in both species may be different.
The satDNA families shared between T. delpontei and T. infestans are highly conserved in the nucleotide sequence. In fact, for many satDNAs, the consensus sequences were the same in both species or had very high similarity. The greatest differences were observed in three satDNA families: TinfSat03-4 vs. TdelSat02-8, TinfSat09-113 vs. TdelSat06-25 and TdelSat14-94 vs. TinfSat45-94 (Supplementary Figure S2). In T. infestans, the TinfSat03-4 satDNA is organized as (GATA) n arrays where it is possible to find some variants of this sequence, such as GATAGTTA, GATAGATTA or GATAGGTA (Pita et al. 2017a). However, in T. delpontei, the satDNA TdelSat02-8 is mainly organized by tandem repetition of the sequence (GATAGTTA) n . Therefore, this sequence of eight nucleotides, GATAGTTA, should be considered the consensus sequence of this satDNA in T. delpontei (TdelSat02-8). Notwithstanding, since GATA and GATAGTTA repeats are intermixed in the reads, and the similarity between the sequences is 80% (GATAGATA vs. GATAGTTA), it could be considered as the same satDNA. TinfSat09-113 is a higher-order repeat (HOR) originating from a 25 bp sequence with duplications and insertions [26]. In T. delpontei, TdelSat06-25 forms arrays with this 25 bp sequence, also with variations by duplications and insertions, but does not form an HOR as in T. infestans. TdelSat14-94 vs. TinfSat45-94 have only 60% similarity but contain a 38 bp region with a 100% similarity. The presence of this sequence could indicate that they are evolutionarily related, although their sequences have diverged significantly between the two species.
Both satellitomes presented here were compared with another triatomine species R. prolixus [31], which belongs to the Rhodniini tribe. Instead, species from the genus Triatoma are included in the Triatomini tribe. A previous study showed that this species shared six satDNAs with T. infestans [31], among them the telomeric (TTAGG) n and (GATA) n repeats. The new satellitome analysis carried out in T. infestans has determined the existence of a new shared satDNA family between both species (TinfSat43-242 vs. RproSat05-208). The seven satDNAs are also present in T. delpontei ( Table 2).
The divergence of each satDNA family was calculated using RepeatMasker. In T. delpontei, divergence values ranged from 0.04% (TdelSat85-49) to 24.02% (TdelSat25-138), showing a median value of 5.62% (Supplementary Table S1). A similar range was found for Andean and non-Andean T. infestans (Supplementary Table S2) as well as in R. prolixus [31]. In non-Andean T. infestans, the values ranged from 1.29% to 23.45%, with a median of 9.29%. The lowest value corresponds to TinfSat05-4, and the highest one was for the TinfSat18-102 family. While in the Andean T. infestans lineage, the values ranged from 0.64% (TinfSat05-4) to 19.53% (TinfSat12-84), the median value is almost the same as the non-Andean lineage, at 8.55%. In R. prolixus, the nucleotide divergence of all satDNA families ranged between 0.88% (RproSat32-59) and 28.28% (RproSat13-293), with a median value of 10.03% [31]. Notwithstanding, the satellite DNA landscape curve reflects that overall, T. delpontei has a peak at 9% diversity (x-axis on Figure 2), while T. infestans shows the curve peak at 4% and 5% in non-Andean and Andean, respectively (Figure 3). In addition, the curve shape is completely different since T. infestans has a positively skewed distribution. This could be easily explained by looking at the most abundant family in T. delpontei, TdelSat01-79, also one of the highest abundances in T. infestans (TinfSat02-79). While in T. delpontei, this family has a great divergence (12.73%), with a peak at 10% in the satellite DNA landscape (Figure 4), in T. infestans, the divergence was 4.79% and 3.71%, with satellite DNA landscape peaks of 2% and 6% in non-Andean and Andean T. infestans, respectively, showing a much more conserved family (Figure 4). The data for all satDNA families are depicted in Supplementary Tables S1 and S2. Individual satellite DNA landscapes for each satDNA family are shown in Supplementary Figures S3-S5.      Above: satellite abundance (Mb) estimated by RepeatMasker using each species consensus in the three analyzed genomes, and satellite DNA landscape graphics (abundance in Mb vs. divergence in percentage). Each calculation was performed as above using each species' consensus over their own genome reads. Bottom: RepeatProfiler variation analyses over the three genomes analyzed. The consensus sequences of T. delpontei were used.
As commented above, the satDNAs shared between T. delpontei and T. infestans present very different amounts in the genomes of both species. Figure 4 shows the satellite DNA landscape graphics and absolute amounts for the most abundant satDNAs in T. delpontei, as well as the same data for these satDNAs in Andean and non-Andean T. infestans lineages. The main satDNA of T. delpontei (TdelSat01-79, 18.16% genome) was also present in T. infestans (TinfSat02-79), where it also represents an important part of the Andean lineage genome (9.30%), being also the main satDNA in the non-Andean lineage (11.13%). Due to the big differences in the C-value between these genomes (T. delpontei 2836 Mb, Andean T. infestans 1936 Mb, non-Andean T. infestans 1487 Mb), the comparisons between absolute amounts of each satDNA were performed ( Figure 4A). In both T. infestans lineages, the amounts of TinfSat02-79 are between 150 and 200 Mb. However, TdelSat01-79 satDNA has been significantly amplified in the T. delpontei genome, with more than 500 Mb. GATA repeats (TdelSat02-8 vs. TinfSat03-4) are highly amplified in the Triatoma species, representing 14.21% of the T. delpontei genome and 3.20% and 6.24% of the T. infestans genome (non-Andean and Andean, respectively). These results indicate that GATA repeats were intensely amplified in the Andean T. infestans genome (above 150 Mb) in relation to non-Andean T. infestans (above 50 Mb), and especially in T. delpontei (above 250 Mb) ( Figure 4B). Other highly amplified satDNAs in T. delpontei appear in very low proportion in T. infestans. For example, TdelSat03-10 and TdelSat04-53 represent 12.79% and 3.15% of the T. delpontei genome (about 360 and 90 Mb, respectively), but their equivalents in T. infestans do not reach 0.1% of the genome ( Figure 4C,D). There is not much difference, in terms of absolute amount, for TdelSat05-1000 vs. TinfSat04-1000. This satDNA is slightly amplified in the T. infestans genome in comparison with T. delpontei. In T. delpontei, this satDNA family represents 2.50% genome (71 Mb) and 6.47-4.93% (96 and 95 Mb) in T. infestans (non-Andean and Andean, respectively) ( Figure 4E). For the next most abundant satDNA (TdelSat06-25 vs. Tinf09-113), there are important differences between the two species ( Figure 4F). The amount of this satDNA in T. delpontei (0.437%, 12.4 Mb) is 10-fold the amount in Andean T. infestans (0.064%, 1.23 Mb) and more than 2-fold in non-Andean T. infestans (0.358%, 5.32 Mb).
T. delpontei consensus sequences from all satDNA families were included in the Re-peatProfiler analysis. This analysis allows the visualization of coverage depth profiles for repetitive sequences and the profile sequence variation in relation to a consensus sequence. Profiles for all satDNAs are shown in Supplementary Figure S6. This analysis determined that the number of shared satDNAs between T. delpontei and T. infestans is much higher than the 51 families detected with RepeatExplorer. In fact, most of the satellites are present in both species and within T. infestans in both lineages. RepeatProfiler results determined that another additional 68 satDNA families are also shared between T. delpontei and T. infestans, raising the number of shared families to 119. RepeatProfiler analysis showed that for these 68 satDNA families, there are strong differences in their abundance between both species. The low abundance of these satDNAs in T. infestans may explain why they have not been detected with RepeatExplorer, where only repeats with abundances higher than 0.001% were selected. RepeatMasker analysis failure could be due to the use of a small reads sample to evaluate the genome. In addition, software differences in the performance to identify repeats in the genome could be affecting this. It cannot be ruled out that the 41 satDNA families that seem to be present only in T. delpontei could be, in fact, also present in T. infestans, but in such a small abundance that it was not possible to detect them with RepeatProfiler either. Figure 4 shows the satellite DNA landscape and RepeatProfiler results for the six most abundant satDNAs of T. delpontei and for the equivalent satDNAs in T. infestans. The satellite DNA landscape of TdelSat01-79 vs. TinfSat02-79 shows that this satDNA is more variable in T. delpontei than in T. infestans. RepeatProfiler shows the existence of a large number of variable positions along the repeat sequence for T. delpontei, while the sequence in T. infestans was more conserved, with both lineages showing a similar pattern of variation. These results are in concordance with the diversity values estimated for these satDNAs: 12.73% for TdelSat01-79, 3.71% and 4.79% for Andean and non-Andean TinfSat02-79 (Supplementary Tables S1 and S2). Similar results were obtained for other satDNA families, such as TdelSat04-53 ( Figure 4D). The RepeatProfiler analyses showed that for some satDNAs, the nucleotide variation is similar in both species, as happens with TdelSat02-8, TdelSat03-10, TdelSat05-1000 or TdelSat06-25 ( Figure 4B,C,E,F), in spite of the different amounts in both species.
As previously reported [15], the C-banding pattern in T. delpontei showed the presence of heterochromatic blocks in one chromosomal end of all autosomes and the X chromosome, which reach almost half of the chromosome length ( Figure 5A), while the Y chromosome is entirely heterochromatic. However, T. infestans autosomes showed prominent C-heterochromatic regions at both ends. This pattern varies depending on the lineage. Andean lineage bears 7 to 10 autosomal pairs plus the X chromosome with C-heterochromatin. Non-Andean lineage has only three pairs of autosomes with positive C-banded regions. The Y chromosome is entirely heterochromatic in both lineages [6,7,26]. Major satDNA families (TdelSat01-79 to TdelSat06-25) were physically mapped on T. delpontei chromosomes. The two main satDNA families (TdelSat01-79, TdelSat02-4) were localized on the heterochromatin in both autosomes and sex chromosomes ( Figure 5B,C). TdelSat03-10 and TdelSat04-53 were also the building blocks of the autosomes heterochromatin and the X chromosome, but were not present in the Y chromosome ( Figure 5D,E). TdelSat05-1000 and TdelSat06-25 were restricted to the euchromatic regions ( Figure 5F,G). In addition, FISH with the other two satDNAs was carried out. The first one was TinfSat01-33, the main satDNA in T. infestans. Results showed that this satDNA was located on the euchromatic regions of the autosomes and on the X chromosome ( Figure 5I). The second one was CATA repeats (TdelSat13-4). In T. infestans, this repeat (TinfSat05-4) is located on the heterochromatic regions of the autosomes. In T. delpontei, this satDNA is also present on the autosome heterochromatin, but also on the heterochromatin of both sex chromosomes ( Figure 5H). It is noteworthy that, for satDNAs located mainly on the heterochromatin, less intense hybridization signals were also observed in the autosomal euchromatic regions.

Discussion
Hitherto, repetitive DNA in triatomines has been analyzed in just a few species [26,31,60,61]. The T. delpontei repeatome analysis showed that the most abundant component in its genome is repetitive DNA. A similar situation is observed in other invertebrate genomes where the repetitive fraction represents nearly or more than half of their genomes, such as Tenebrio molitor Linnaeus, 1758 (Coleoptera: Tenebrionidae) [62], Pontastacus leptodactylus Eschscholtz, 1823 (Decapoda: Astacidae) [49], Octopus vulgaris Cuvier,

Discussion
Hitherto, repetitive DNA in triatomines has been analyzed in just a few species [26,31,60,61]. The T. delpontei repeatome analysis showed that the most abundant component in its genome is repetitive DNA. A similar situation is observed in other invertebrate genomes where the repetitive fraction represents nearly or more than half of their genomes, such as Tenebrio molitor Linnaeus, 1758 (Coleoptera: Tenebrionidae) [62], Pontastacus leptodactylus Eschscholtz, 1823 (Decapoda: Astacidae) [49], Octopus vulgaris Cuvier, 1797, Octopus bimaculoides Pickford & McConnaughey, 1949 or Architeuthis dux Steenstrup, 1857 (Cephalopoda: Coleoidea) [63]. In the case of T. delpontei, this scenario was expected, since C-banding reveals that half of the autosomal chromosomes bear heterochromatin, as well as half of the X and the entire Y chromosome. However, a striking and unique result was that more than half of the genome is represented only by satDNA sequences (Figure 1). As said before, four satDNA families, included in the heterochromatin, represent more than 48% of the T. delpontei genome.
Regarding the Triatoma genus, the present results depicted the great importance of the satDNA sequences in genome evolution. Comparison between sister species, such as between T. delpontei and both lineages of T. infestans, showed that satDNA is the main repetitive fraction of their genomes. Additionally, a detailed analysis of this fraction allowed the determination that within the infestans sub-complex species, the heterochromatin is mainly formed by only a few different satDNA families. This trend has also been reported for other Heteroptera species, such as Holhymenia histrio Fabricius, 1803 [41], as well as in other insect groups, such as Coleoptera [38,39,64] or Hymenoptera [44,65]. The results show that the large genome size of T. delpontei is mainly due to the significant increase in some families of satDNA sequences, which turned out to be the largest genome so far reported in Heteroptera (Sadilek et al. [22] and references there). A similar situation was observed in T. infestans, where few satDNAs families mainly located in the heterochromatin were responsible for the variation in genomic DNA content between T. infestans lineages [26]. Overall, whether this pattern is universal for the Triatoma genus or not is yet to be determined. Future analyses of other species are needed to test this hypothesis.
T. delpontei heterochromatin is formed mainly by just four satDNA families. Two of these satDNAs are present in the heterochromatin of both species, TdelSat01-79 and TdelSat02-8, although their abundances is significantly higher in T. delpontei with respect to their homologs in T. infestans (TinfSat02-79 and TinfSat03-4) (Figure 4). The other two abundant satDNAs in the T. delpontei heterochromatin (TdelSat03-10, TdelSat04-53) were also found in the genome of T. infestans (TinfSat07-10/36-10 and TinfSat10-53), although in the latter species, their abundances were much lower and they were located on the euchromatin. Conversely, there are also satDNAs located in the euchromatin in T. delpontei that were part of the heterochromatin in T. infestans, as TinfSat01-33. It is important to note that neither RepeatExplorer, RepeatMasker nor RepeatProfiler were able to detect TinfSat01-33 in the T. delpontei NSG reads. However, we were able to map this satDNA family by FISH ( Figure 5I). This could be due to inter-population variability within T. delpontei, but we cannot rule out the hypothesis that the amount or the divergence within this satDNA in the T. delpontei genome could hamper the detection by the software. For example, this satDNA could be in a scarce amount in the genome, which would be under the detection limit, or within the limit of the low-pass genome sequencing. It is also possible that the existence of an elevated nucleotide divergence in T. delpontei in relation to T. infestans might avoid the detection of satDNA by the used software. Lastly, it could also be possible that a similar sequence to TinfSat01-33, belonging to another repetitive element (i.e., a TE), was responsible for this FISH result.
Interestingly, CATA repeats are present in the genomes of T. infestans and T. delpontei (TdelSat13-4 vs. TinfSat05-4), but there were important differences with respect to their abundance and chromosomal location. In T. infestans, this satDNA was located on the heterochromatin of the autosomal bivalents [26]. In T. delpontei, CATA repeats have also spread in the heterochromatin of all autosomes. Interestingly, it is noteworthy that in T. delpontei, the extension of CATA repeats has also involved the sex chromosomes.
T. delpontei and T. infestans share most of their satDNA families, although in each species their abundances are different. The number of shared families is lower when it is compared with genomes of species from other genera, such as R. prolixus. These results are in accordance with the postulations of the "library hypothesis", which states that related species share a set of ancestral satDNA families. Hence, closely related species would tend to share a larger proportion of satDNA families than more distant ones [66,67]. According to the FISH results in triatomines, the satDNAs with low abundance are dispersed in the euchromatin. In addition, even satDNAs that build up the heterochromatin are located in euchromatic regions. These "euchromatic" satDNAs would probably be organized in arrays with fewer copies. Hence, it is probable that this set of repetitive sequences dispersed in the euchromatin would form the satDNA "library". Stochastic mechanisms would move some of these satDNAs to the heterochromatin, where they could be highly amplified. This might have occurred with the TdelSat03-10 and TdelSat04-53 families, which are abundant in the T. delpontei heterochromatin but are in low proportion in the euchromatin of T. infestans. The other way around happened with TinfSat01-33, which is the main satDNA of T. infestans, but is in a much lower proportion in the euchromatin of T. delpontei.
It is also possible to observe large differences in the amount of satDNA located on the chromosomal euchromatic regions, as happens with TdelSat05-1000 vs. TinfSat04-1000 or TdelSat06-25 vs. TinfSat09-113 ( Figure 4E,F). There are no data to explain the cause of why a given satDNA family increases or decreases its proportion in the genome. It is possible that mechanisms that generate abundance variations for other tandem repeat sequences, such as unequal crossing over or replication slippage, were responsible for such variation. It cannot be ruled out that mechanisms such as ectopic recombination are in relation to the observation of stick chromatin during T. delpontei and T. infestans meiosis (Panzera et al. 1995). Variations in the amount of a particular satDNA family may occur between species, but also at the population level, as was observed between the Andean and non-Andean T. infestans lineages [26]. These intraspecific differences have been observed in other insect groups [39,45,[68][69][70].
In order to obtain a better picture of satDNA dynamics between species, DNA landscape comparisons lead to the observation that the T. delpontei genome possesses a higher peak of divergence (9%) than T. infestans (4% and 5% in non-Andean and Andean, respectively). The most abundant satDNA family in T. delpontei (TdelSat01-79) is highly responsible for this trait (see results section). In spite of presenting clustered repeats that tend to be homogenized, extremely large arrays in T. delpontei could be hampering the homogenization process. Therefore, mutations could be accumulated at the extremes of the arrays and increment the divergence. Notwithstanding, the median divergence value of T. delpontei (5.62%) is lower than other insects [38,39,45,71,72], including both T. infestans lineages (9.29% and 8.55% non-Andean and Andean, respectively) and the variable R. prolixus satellitome, with a median value of 10.03% [31]. The divergence values of satDNAs are directly related to mutation rate and inversely related to amplification and homogenization [73,74]; hence, the lower value present in the T. delpontei satellitome means that the T. delpontei genome would be more prone to homogenization processes than both T. infestans lineages, probably due to the presence of larger arrays within the euchromatic regions.
T. delpontei genome is then constructed by a huge amount of satDNA sequences, constituting more than half of it. The difference with other Triatoma species is around double the C-value. Most Triatoma species have a genome size of around 1222 Mb, while that of T. delpontei is 2836 Mb [20,21]. Within this scenario, satellitome determination and analysis could lead to a hypothesis that explains how satDNA sequences have grown. Interestingly, the T. infestans genome presented a singular variability in its DNA content, presenting a non-Andean 1487 Mb genome and an Andean 1936 Mb genome. Hence, T. infestans variation could be taken as an intermediate step to the configuration of the outstanding T. delpontei genome. As commented above, all three genomes share a common trait in which satDNA is the major component of the repetitive DNA. Furthermore, satDNA families are mostly shared and are responsible for the size variation. Interestingly, satDNA families that invaded the heterochromatin and were highly amplified led to the hypothesis that the genomic environment on subtelomeric regions of Triatoma could be influencing this phenomenon. Therefore, repeats that reach subtelomeres are prone to be involved in an expansion process. Ectopic recombination between non-homologous subtelomeres could enhance this process, spreading the repeats. Given the importance of repetitive sequences on the Triatoma karyotypic evolution, further investigation is essential. What is more, this topic could shed light to elucidate the mechanisms involved in the speciation processes within Triatoma, a specious genus within Triatominae, encompassing 82 out of the more than 150 described species.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/genes14020371/s1. Table S1: data of the satDNA families found in T. delpontei; Table S2: data of the satDNA families found in T. infestans; Figure S1: alignments of the consensus sequences between different satDNA families of T. delpontei; Figure S2: alignments of the consensus sequences for the shared satDNA families between T. delpontei and T. infestans; Figure S3: satellite DNA landscapes for all satDNA families found in Triatoma delpontei; Figure S4: satellite DNA landscapes for all satDNA families found in Andean Triatoma infestans; Figure S5: satellite DNA landscapes for all satDNA families found in non-Andean T. infestans; Figure S6: RepeatProfiler results for the analyses of all satDNAs found in T. delpontei over three genomes: T. delpontei, Andean and non-Andean T. infestans lineages.
Institutional Review Board Statement: No special permits were required to retrieve and process the samples because this study did not involve any live vertebrates, nor regulated invertebrates.

Informed Consent Statement: Not applicable.
Data Availability Statement: All satDNA consensus sequences were submitted to the NCBI database (Acc. Numbers OQ82080-OQ82289).

Conflicts of Interest:
The authors declare no conflict of interest.