Rhabdoviral Endogenous Sequences Identified in the Leishmaniasis Vector Lutzomyia longipalpis Are Widespread in Sandflies from South America

Sandflies are known vectors of leishmaniasis. In the Old World, sandflies are also vectors of viruses while little is known about the capacity of New World insects to transmit viruses to humans. Here, we relate the identification of RNA sequences with homology to rhabdovirus nucleocapsids (NcPs) genes, initially in the Lutzomyia longipalpis LL5 cell lineage, named NcP1.1 and NcP2. The Rhabdoviridae family never retrotranscribes its RNA genome to DNA. The sequences here described were identified in cDNA and DNA from LL-5 cells and in adult insects indicating that they are transcribed endogenous viral elements (EVEs). The presence of NcP1.1 and NcP2 in the L. longipalpis genome was confirmed in silico. In addition to showing the genomic location of NcP1.1 and NcP2, we identified another rhabdoviral insertion named NcP1.2. Analysis of small RNA molecules derived from these sequences showed that NcP1.1 and NcP1.2 present a profile consistent with elements targeted by primary piRNAs, while NcP2 was restricted to the degradation profile. The presence of NcP1.1 and NcP2 was investigated in sandfly populations from South America and the Old World. These EVEs are shared by different sandfly populations in South America while none of the Old World species studied presented the insertions.


Introduction
Leishmaniasis, caused by parasites of the genus Leishmania, is a serious public health problem.Parasites are transmitted through the bite of infected sandflies and can cause different disease manifestations.Among these visceral leishmaniasis (VL) diseases is the most severe form of the disease that can lead to death in untreated cases [1].In the New World, VL is mostly caused by Leishmania infantum [2], which is transmitted by the sandfly Lutzomyia longipalpis.Lutzomyia, the most important New World sandfly genus in terms of species diversity and medical importance exhibits a wide distribution area covering South and Central Americas [3][4][5].
Sandflies are mostly known for transmitting leishmaniasis but can also harbor and transmit viruses [6].Although in Europe, these insects of the genus Phebotomus represent important viral vectors [7], little is known about sandfly-borne viruses in the Americas.One example is the vesicular stomatitis virus, which infects humans and domestic animals and is widely endemic in the New World [8].There are reports on the isolation of arboviruses from sandflies in several areas of the Amazon [9,10].In sandflies collected from Brazil, Colombia and Guatemala, five different new phleboviruses were isolated [11].In 2020, through in silico analyses and RT-PCR experiments, a mitovirus was identified in L. longipalpis [12].
One of the consequences of viral infections is the possibility of partial or total integration of the viral genetic material into the host genome.Until the last decade, it was believed that this ability was exclusive to retroviruses.Retroviruses have a genome made up of positive single-stranded RNA, which, during the infectious process, retrotranscribes its genome into complementary DNA (cDNA), which, through an integrase enzyme, is incorporated into the genome of infected cells.These retroviral sequences inserted into the host genome are known as endogenous retroviruses (ERVs).ERVs are very common in vertebrate genomes, making up about 8% of the human genome [13].In 2010, Horie et al. identified viral elements of nonretroviral origin in the genome of several mammals.After that, the integration of different viruses into the genome of several eukaryotes, including invertebrates and plants, was shown [14][15][16][17][18].In silico studies identified the presence of endogenous viral elements (EVEs) in arthropods used as a model organism or of medical interest.These studies were carried out on the mosquitoes Aedes aegypti, Culex quinquefasciatus, the tick Ixodes scapularis and the sandflies L. longipalpis, Phlebotomus duboscqi and Sergentomyia sp.Showing the occurrence of fragments of rhabdovirus and other viruses integrated into the genomes [19,20].In a study in which the genomes of 48 different arthropods were analyzed, researchers observed that EVEs were widely present in these organisms.The majority came from the integration of unclassified single-stranded RNA (ssRNA) viruses and viruses from the Rhabdoviradae and Parvoviridae families, and most of these EVEs were located in piRNA clusters transcribing piRNAs [21].The EVEs can be considered fossils of viral infections that occurred in the past, which somehow remained in the genome of their hosts and descendants over time.The integration of exogenous viral sequences in the genome can be deleterious, neutral, or positive to hosts.The viral sequences integrations in the germ line cells with deleterious effect to the host may be lost in a single host generation and only slight deleterious, neutral or advantageous insertions may expand to the host population.In this way, EVEs can be used as an element of phylogenetic analysis between different populations [19].Some of these EVEs that have even been conserved and selected over long evolutionary timescales can confer new essential functions to their hosts.This phenomenon is known as exaptation, when a molecule or structure evolves to a different function than it was originally designed for.Increasing evidence in insects shows that most EVEs are transcriptionally active and produce small interfering RNA sequences (sRNAs) [22].Although the exact function of EVEs is so far unknown, some studies suggest that EVEs may interfere with virus replication by producing PIWI-interacting RNA sequences (piRNAs) that recognize and degrade viral RNA sequences through sequence complementarity [23,24].The exaptative process involving EVEs is exemplified in these cases by the change in the function of sequences that were previously related to the expression of viral proteins and that have now become part of the host's immune system.
Viruses 2024, 16, 395 3 of 14 In the present work, we detected two viral RNA sequences coding for rhabdoviral nucleocapsid proteins in L. longipalpis LL5 cells' exosomes and in adults from our colony formed by insects collected in Jacobina, BA, Brazil.Posteriorly, we determined that these viral sequences were not derived from virus infection but were the product of the transcription of viral elements inserted in intron regions of two putative protein-coding genes in the L. longipalpis genome, being classified as endogenous viral elements (EVEs).These EVEs were named NcP1.1 and NcP2.A subsequent in silico analysis showed the existence of a third viral insertion, located in the same intronic region where NcP1.1 was located.This new EVE was called NcP1.2.Pre-processing of the small RNA library revealed that NcP1.1 and NcP1.2 presented a profile consistent with elements targeted by primary piRNAs and therefore could play a role in sandfly immunity.This profile was not identified for NcP2.We also investigated whether other sandfly populations and species in the New and Old Worlds shared these EVEs.We observed that diverse sandflies from different regions of South America shared these EVEs.None of the Phlebotomus species from the Old World studied here showed such EVEs.

Sandflies
L. longipalpis from our colony, originally collected in Jacobina (Bahia, Brazil), were kept in our insectary at 26 • C and fed on 70% sucrose solution ad libitum.Females were fed on anesthetized hamsters when needed.Sandflies collected in the field were immediately added to a TRIzol ® reagent (Invitrogen ® , Carlsbad, CA, USA) for processing.Sandflies from the Old World were kept as described in [25].

RNA and DNA Extractions
RNA and DNA were extracted using the TRIzol ® reagent (Invitrogen ® , Carlsbad, CA, USA), according to the manufacturer's instructions.RNA was stored at −80 • C and DNA at −20 • C.

cDNA Synthesis
After RNA extraction, a possible contamination with genomic DNA was verified and, when present, the samples were treated with the RNA-free DNAse TURBO DNAfree Kit (Ambion, Austin, TX, USA).cDNA was synthesized using the SuperScript ® III First-StrandSynthesis System kit (Invitrogen ® , Carlsbad, CA, USA), according to the manufacturer's recommendations.

Polymerase Chain Reactions (PCR)
Primers (Table 1) were designed using the Oligonucleotide-BLAST (NCBI-NIH) and AmplifX programs, available at: https://www.ncbi.nlm.nih.gov/tools/primer-blast/(accessed on 30 March 2023) and https://amplifx.Software.informer.com/1.7/(accessed on 30 March 2023), respectively.The detection of viral sequences in cDNA samples was performed by conventional PCR reactions using the following conditions: initial denaturation at 95 • C for 5 min, followed by 35 cycles of denaturation at 95 • C for 30 s, annealing at 56 • C for 30 s and extension at 72 • C for 30 s, followed by an additional extension step of 5 min at 72 • C. The DNA reactions were conducted with the following temperature conditions: initial denaturation at 95 • C for 5 min followed by 35 cycles of denaturation at 95 • C for 45 s, annealing at 57 • C for 45 s and extension at 72 • C for 1 min and 30 s and then an additional extension step of 5 min at 72 • C.

Name
Sequence Amplicon Length (nt)

Agarose Gel Electrophoresis
Samples amplified by PCR were submitted to electrophoresis in 1.5% agarose gels in Tris-acetate-EDTA (TAE) buffer at 1× concentration containing 0.5 µg/mL of ethidium bromide at 110 mV.

Bioinformatics Tools
The genomic insertions identified in L. longipalpis were investigated in silico using the BLAST tool (

Alignment
The nucleotide sequences of EVEs NcP1.1, NcP1.2 and NcP2 were submitted as probe for homology searches using the BlastX tool (Basic Local Alignment Search Toolhttps://blast.ncbi.nlm.nih.gov/Blast.cgiv. 2.13.0) against the NCBI databases (National Center for Biotechnology Information (Bethesda, MD, USA)) The EVE-deduced amino sequences obtained in BlastX were aligned with nucleocapsid proteins from rhabdovirus using the ClustalW v. 2.1 multiple alignment tool [26].

Evolutionary Analysis by Maximum Likelihood Method
The evolutionary history was inferred by using the maximum likelihood method with a JTT matrix-based model [27].The tree with the highest log likelihood (−1932.07) is shown.Initial tree(s) for the heuristic search were obtained automatically by applying the neighbor-join and BioNJ algorithms to a matrix of pairwise distances estimated using the JTT model and then selecting the topology with superior log likelihood value.The tree was drawn to scale, with branch lengths measured in the number of substitutions per site.This analysis involved 11 amino acid sequences.There was a total of 96 positions in the final dataset.Evolutionary analyses were conducted using MEGA11 v. 11.0.13[28].

Small RNA Analysis
Public L. longipalpis small RNA libraries were downloaded from NCBI SRA database and the reads merged into one single file to increase depth.Pre-processing of the resultant RNA library was performed as described [29].Briefly, raw sequences were submitted to quality filters and adaptor removal.Sequences with low Phred quality (<20), ambiguous nucleotides and/or a length shorter than 15 nt were eliminated.Pre-processed reads were aligned against reference sequences using the Bowtie program (v1.1) [30] accepting 1 mismatch.The putative Rhabdovirus sequences were compared against the reference genome of the L. longipalpis reference genome, (Jacobina strain, version J1.2) downloaded from the VectorBase website (www.vectorbase.com,accessed on 14 April 2022) using BLAST software v. 2.12.0 [31] in its BlastN variation requiring e-value < 1 × 10 −5 .The analysis of the small RNA size profile, 5 ′ base preference, density of coverage, and additional data analysis were evaluated using in-house Perl and R scripts.

Determination of the Origin of RNA Sequences Coding for Viral Proteins in Exosomal Fraction of LL5 Cells
In previous work, nucleic acid sequencing was performed on an exosomal pellet from L. longipalpis LL5 embryonic cells.After alignments with databases, two partial RNA coding sequences showing similarity to rhabdovirus nucleocapsid proteins with 477 and 459 nucleotides were identified and named NcP1.1 and NcP2, respectively (Figure S1).
The presence of RNA fragments coding for rhabdoviral nucleocapsid proteins in the exosomal fraction of LL5 cells raised the question regarding their origin.These sequences could be derived from an exogenous viral infection or from viral insertions in the insect genome.Since members of the Rhabdoviridae family are negative single-stranded RNA viruses, whose genome is never in the form of DNA, we performed PCR assays, using DNA and cDNA samples from LL5 cells as templates to answer this question (Figure 1).match.The putative Rhabdovirus sequences were compared against the reference genome of the L. longipalpis reference genome, (Jacobina strain, version J1.2) downloaded from the VectorBase website (www.vectorbase.comaccessed on 14-April-2022) using BLAST software v. 2.12.0 [31] in its BlastN variation requiring e-value < 1 × 10 −5 .The analysis of the small RNA size profile, 5′ base preference, density of coverage, and additional data analysis were evaluated using in-house Perl and R scripts.

Determination of the Origin of RNA Sequences Coding for Viral Proteins in Exosomal Fraction of LL5 Cells
In previous work, nucleic acid sequencing was performed on an exosomal pellet from L. longipalpis LL5 embryonic cells.After alignments with databases, two partial RNA coding sequences showing similarity to rhabdovirus nucleocapsid proteins with 477 and 459 nucleotides were identified and named NcP1.1 and NcP2, respectively (Figure S1).
The presence of RNA fragments coding for rhabdoviral nucleocapsid proteins in the exosomal fraction of LL5 cells raised the question regarding their origin.These sequences could be derived from an exogenous viral infection or from viral insertions in the insect genome.Since members of the Rhabdoviridae family are negative single-stranded RNA viruses, whose genome is never in the form of DNA, we performed PCR assays, using DNA and cDNA samples from LL5 cells as templates to answer this question (Figure 1).Both templates were positive, revealing that the sequences were derived from the transcription of viral elements inserted in the genome of LL5 cells.These results were confirmed by the analysis of the L. longipalpis genome data deposited on the Vector Base and NCBI database sites using the BlastN tool v. 2.13.0.

Genomic Context of NcP1.1 and NcP2 in the L. longipalpis Genome
BlastN using the NcP1.1 and NcP2 sequences as bait against the Vector Base L. longipalpis deposited genome confirmed the result obtained with the PCR experiments, showing that both sequences were present in the genome and located in intronic regions of two Both templates were positive, revealing that the sequences were derived from the transcription of viral elements inserted in the genome of LL5 cells.These results were confirmed by the analysis of the L. longipalpis genome data deposited on the Vector Base and NCBI database sites using the BlastN tool v. 2.13.0.

Genomic Context of NcP1.1 and NcP2 in the L. longipalpis Genome
BlastN using the NcP1.1 and NcP2 sequences as bait against the Vector Base L. longipalpis deposited genome confirmed the result obtained with the PCR experiments, showing that both sequences were present in the genome and located in intronic regions of two deduced protein-coding genes.The EVE NcP1.1 is located between the nucleotides 63,562 and 64,974 of the supercontig JH689452.This region is in the intron of the unannotated putative protein-coding gene LLOJ001560.Surprisingly, the BlastP analysis of the deduced amino acid sequence encoded by this gene against the NCBI data bank showed homology and 35% of identity with the viral capsid protein from Nebet virus Seq ID: QRW425091.Furthermore, the BlastX tool analysis of the region where NcP1.1 is located revealed the existence of another EVE with 950 bp, with homology to viral nucleocapsid protein, in the intron of this putative protein-coding gene, localized between nucleotides 62,442 and 61,493 of the supercontig JH689452.This new EVE was named NcP1.2.Interestingly, NcP1.1 and Viruses 2024, 16, 395 6 of 14 NcP1.2present different transcriptional orientations (Figure 2).NcP2 was located between the nucleotides 58,124 and 59,430 of the supercontig JH689584, an intronic region of the deduced protein-coding gene LLOJ004474.The BlastP analysis of the deduced amino acid sequence encoded by this gene revealed homology with the hrp65 protein, which is related to the transport of RNA from the nucleus to the cytoplasm of the cell (Figure 2).In addition to confirming the result obtained with the PCR assays and the identification of another viral insertion in the genome (Ncp1.2), the analysis using the BlastN and BlastX tools against the L. longipalpis genome deposited on the Vector Base website revealed that the sequences NcP1.1 and NcP2 were larger than previously identified.Ncp1.1 and NcP2 went from 477 and 459 bp to 1413 and 1307 bp, respectively (Figure S1).
the deduced amino acid sequence encoded by this gene revealed homology with the hrp65 protein, which is related to the transport of RNA from the nucleus to the cytoplasm of the cell (Figure 2).In addition to confirming the result obtained with the PCR assays and the identification of another viral insertion in the genome (Ncp1.2), the analysis using the BlastN and BlastX tools against the L. longipalpis genome deposited on the Vector Base website revealed that the sequences NcP1.1 and NcP2 were larger than previously identified.Ncp1.1 and NcP2 went from 477 and 459 bp to 1413 and 1307 bp, respectively (Figure S1).
The BlastN analysis using the sequences of NcP1.1, NcP1.2 and NcP2 against the nucleotide database deposited at NCBI showed that NcP1.1 and NcP1.2 presented 100% of identity with the uncharacterized mRNA LOC129786293 and NcP2 showed 99% identity with four gaps in 1311 nucleotides, with the L. longipalpis hrp65 protein (LOC129793427), transcript variant X2 mRNA.This same analysis revealed that all three EVEs were located on chromosome I of L. longipalpis isolate SR_M1_2022.These results confirm what we observed experimentally about NcP1.1 and NcP2 and reveal that the EVE NcP1.2 is also transcribed in L. longipalpis.The BlastN analysis using the sequences of NcP1.1, NcP1.2 and NcP2 against the nucleotide database deposited at NCBI showed that NcP1.1 and NcP1.2 presented 100% of identity with the uncharacterized mRNA LOC129786293 and NcP2 showed 99% identity with four gaps in 1311 nucleotides, with the L. longipalpis hrp65 protein (LOC129793427), transcript variant X2 mRNA.This same analysis revealed that all three EVEs were located on chromosome I of L. longipalpis isolate SR_M1_2022.These results confirm what we observed experimentally about NcP1.1 and NcP2 and reveal that the EVE NcP1.2 is also transcribed in L. longipalpis.

Multiple Alignment and Phylogenetic Analysis of EVE-Deduced Proteins sequences
The sequences of NcP1.1, Ncp1.2 and NcP2 were translated and aligned with nucleocapsid protein sequences from various rhabdoviruses.The alignment suggested all of them were sequences from different rhabdovirus infections since they aligned in the same region of the deduced proteins.A phylogenetic analysis comparing the EVEs NcP1.1, Ncp1.2 and NcP2 with nucleocapsid sequences from modern rhabdoviruses revealed that NcP1.1 and NcP2 were evolutionarily closer to each other than to nucleocapsid from current rhabdoviruses.NcP1.2 also showed little proximity to the other modern rhabdoviruses (Figure 3).It has been shown that the small RNA profile works as a proxy to determine the origin of the viral sequence [29][30][31][32].Pre-processing of the small RNA library was performed as described [29] to determine whether the sequences NcP1.1, Ncp1.2 and NcP2 had characteristics of viral elements inserted into the genome.Sequences NcP1.1 and NcP1.2 presented a profile consistent with elements targeted by primary piRNAs (accumulation of RNA sequences in between 24 and 29 nt derived for only one strand) while NcP2 was restricted to a degradation profile (low abundance of small RNA sequences of different lengths).In addition, the density of small RNA sequences along the sequences were discontinuous, with hotspots in specific regions and coverage concentrated in one strand (Figure 4).Thus, since they presented most of the canonical features presented by EVEs in insects, they could be classified as endogenous elements.
Viruses 2024, 16, x FOR PEER REVIEW 9 of EVE-deduced (black rhombus) amino acid sequences and nucleocapsid protein sequences from rhabdoviruses.

Molecular Characteristics of Small RNA Sequences Derived from Rhabdoviral Sequences
It has been shown that the small RNA profile works as a proxy to determine the origin of the viral sequence [29][30][31][32].Pre-processing of the small RNA library was performed as described [29] to determine whether the sequences NcP1.1, Ncp1.2 and NcP2 had characteristics of viral elements inserted into the genome.Sequences NcP1.1 and NcP1.2 presented a profile consistent with elements targeted by primary piRNAs (accumulation of RNA sequences in between 24 and 29 nt derived for only one strand) while NcP2 was restricted to a degradation profile (low abundance of small RNA sequences of different lengths).In addition, the density of small RNA sequences along the sequences were discontinuous, with hotspots in specific regions and coverage concentrated in one strand (Figure 4).Thus, since they presented most of the canonical features presented by EVEs in insects, they could be classified as endogenous elements.

Distribution of Rhabdoviral Sequences in Sandfly Populations
We investigated whether the EVEs NcP1.1 and NcP2 and their transcripts, identified in the genome of L. longipalpis from Jacobina, BA and LL5 cells from L. longipalpis from Lapinha, MG, BR, are found in different populations of sandflies from the New and Old Word.

Distribution of Rhabdoviral Sequences in Sandfly Populations
We investigated whether the EVEs NcP1.1 and NcP2 and their transcripts, identified in the genome of L. longipalpis from Jacobina, BA and LL5 cells from L. longipalpis from Lapinha, MG, BR, are found in different populations of sandflies from the New and Old Word.
A total of 61 insect samples, 58 from South America (53 from Brazil, 2 from Argentina and 1 from Colombia) were investigated (Figure 5).Five samples of different species from the Old World were obtained from insectary colonies.Either DNA or RNA, or both were extracted from these samples.Samples not analyzed were noted as not determined (ND).We observed that the EVEs NcP1.1 and NcP2 had a wide distribution, being present and transcribed by sandflies populations from all regions studied (Figure 6).We also found that Old World insectary specimens from the genus Phlebotomous, Phlebotomous arabicus, Phlebotomous argentipes, P. papatasi, Phlebotomous sergenti and Phlebotomous schwetzi, found in nature in Africa, Asia and Europe, did not present these EVEs in their genomes.
extracted from these samples.Samples not analyzed were noted as not determined (ND).We observed that the EVEs NcP1.1 and NcP2 had a wide distribution, being present and transcribed by sandflies populations from all regions studied (Figure 6).We also found that Old World insectary specimens from the genus Phlebotomous, Phlebotomous arabicus, Phlebotomous argentipes, P. papatasi, Phlebotomous sergenti and Phlebotomous schwetzi, found in nature in Africa, Asia and Europe, did not present EVEs in their genomes.

Figure 1 .
Figure 1.PCR assays to amplify the NcP1.1 sequence using cDNA (A) and DNA (B) from LL5 cell line as templates.

Figure 1 .
Figure 1.PCR assays to amplify the NcP1.1 sequence using cDNA (A) and DNA (B) from LL5 cell line as templates.

Figure 2 .
Figure 2. Graphical representation of the insertion sites and transcriptional sense of the EVEs NcP1.1, NcP1.2 and NcP2 in introns of L. longipalpis genes.(A)-Localization and sense of transcription of EVEs NcP1.1 and NcP1.2 (hatched arrows) in the intron of gene LLOJ001560 (double-dashed line).(B)-Localization and sense of transcription of EVE NcP2 (hatched arrow) in the intron of gene LLOJ004474 (double-dashed line).The EVEs' transcription orientation is represented by the hatched arrows orientation.Black arrows represent the gene exons.

Figure 2 .
Figure 2. Graphical representation of the insertion sites and transcriptional sense of the EVEs NcP1.1, NcP1.2 and NcP2 in introns of L. longipalpis genes.(A)-Localization and sense of transcription of EVEs NcP1.1 and NcP1.2 (hatched arrows) in the intron of gene LLOJ001560 (double-dashed line).(B)-Localization and sense of transcription of EVE NcP2 (hatched arrow) in the intron of gene LLOJ004474 (double-dashed line).The EVEs' transcription orientation is represented by the hatched arrows orientation.Black arrows represent the gene exons.

Figure 3 .Figure 3 . 4 .
Figure 3. (A)-Alignment of deduced amino acids sequences of NcP1.1, NcP1.2 and NcP2 with other rhabdovirus nucleocapsid proteins.Regions with a gray or black background indicate similar or identical amino acids, respectively.(B)-Phylogenetic tree showing the relationship among these Figure 3. (A)-Alignment of deduced amino acids sequences of NcP1.1, NcP1.2 and NcP2 with other rhabdovirus nucleocapsid proteins.Regions with a gray or black background indicate similar or identical amino acids, respectively.(B)-Phylogenetic tree showing the relationship among these EVE-deduced (black rhombus) amino acid sequences and nucleocapsid protein sequences from rhabdoviruses.

Figure 5 .
Figure 5. Geographic distribution of the origin of sandfly populations from South America analyzed in this work.

Figure 5 .
Figure 5. Geographic distribution of the origin of sandfly populations from South America analyzed in this work.

Figure 6 .
Figure 6.Determination of the presence and transcription of EVEs NcP1.1 and Ncp2 in sandflies from different populations of South America and the Old World.Green dark grey represents a pos itive result, while light gray indicates a negative result.ND means the experiment was not per formed.

Figure 6 .
Figure 6.Determination of the presence and transcription of EVEs NcP1.1 and Ncp2 in sandflies from different populations of South America and the Old World.Green dark grey represents a positive result, while light gray indicates a negative result.ND means the experiment was not performed.

Table 1 .
Primers employed in PCR assays.