Conserved Small Nucleotidic Elements at the Origin of Concerted piRNA Biogenesis from Genes and lncRNAs

PIWI-interacting RNAs (piRNAs) target transcripts by sequence complementarity serving as guides for RNA slicing in animal germ cells. The piRNA pathway is increasingly recognized as critical for essential cellular functions such as germline development and reproduction. In the Anopheles gambiae ovary, as much as 11% of piRNAs map to protein-coding genes. Here, we show that ovarian mRNAs and long non-coding RNAs (lncRNAs) are processed into piRNAs that can direct other transcripts into the piRNA biogenesis pathway. Targeting piRNAs fuel transcripts either into the ping-pong cycle of piRNA amplification or into the machinery of phased piRNA biogenesis, thereby creating networks of inter-regulating transcripts. RNAs of the same network share related genomic repeats. These repeats give rise to piRNAs, which target other transcripts and lead to a cascade of concerted RNA slicing. While ping-pong networks are based on repeats of several hundred nucleotides, networks that rely on phased piRNA biogenesis operate through short ~40-nucleotides long repeats, which we named snetDNAs. Interestingly, snetDNAs are recurring in evolution from insects to mammals. Our study brings to light a new type of conserved regulatory pathway, the snetDNA-pathway, by which short sequences can include independent genes and lncRNAs in the same biological pathway.


Introduction
PIWI-interacting small RNAs (piRNAs), 24 to 29 nucleotides (nt) long, are essential actors in controlling transposable elements (TEs) in animal gonads. Specific genomic loci called piRNA clusters, of tens or even hundreds of kilobases (Kb), produce large amounts of piRNAs. Owing to sequence complementarity, piRNAs from piRNA clusters guide PIWI proteins to cleave TE transcripts and induce their processing into new piRNAs. These new piRNAs can themselves act as PIWI guides for the cleavage of complementary transcripts inducing production of piRNAs that are identical to the "initiator" piRNAs that originated from the piRNA clusters. This leads to a piRNA amplification process, which has been called the ping-pong cycle [1,2], reviewed by Ozata et al. [3]. piRNAs display specific characteristics related to their biogenesis and to ping-pong amplification. piRNAs that originate from ping-pong amplification show a specific 10-nt 5 -overlap due to the fact that PIWI proteins slice Figure 1. Phased PIWI-interacting RNAs (piRNA) biogenesis in Drosophila melanogaster: Trigger-piRNAs anneal complementary transcripts inducing slicing of the latter to responder-piRNAs that have 10-nt 5 -overlap with trigger-piRNAs. Further slicing of the transcript at uridine residues downstream of the responder-piRNA leads to production of trail-piRNAs. 5 -uridine (U) is largely overrepresented in trail-piRNAs (adapted from Mohn et al. [5]). TE-related sequences are not the only producers of piRNAs. In several organisms, piRNAs originating from the 3 untranslated regions (UTR) of gene transcripts have been reported [7][8][9][10]. Several studies indicate that piRNAs play a role beyond TE silencing (reviewed in Sarkar et al. [8]) and can be involved in diverse cellular processes like programmed genome rearrangement and genome-wide surveillance of germline transcripts [11,12], mRNA, or long non-coding RNA (lncRNA) regulation and development [13][14][15][16][17][18] (reviewed in Ku and Lin and in Rojas-Ríos and Simonelig [19,20]).
Here, we sought to address whether piRNAs, if produced from protein-coding genes and unrelated to any TE-sequences, may lead to piRNA-guided mRNA cleavage and may establish a piRNA regulation network dedicated to gene regulation. To explore this possibility, we studied piRNAs from Anopheles gambiae ovaries. The African malaria vector An. gambiae particularly suits this analysis because, first, piRNAs with a ping-pong signature are produced in the ovarian tissue [21][22][23] and, second, a high proportion of them, 11%, are derived from protein-coding transcripts [21]. We here uncover piRNA biogenesis in An. gambiae resulting from ping-pong amplification between transcripts from distant genes. We identify a novel "intrinsic ping-pong amplification", which is achieved with genome-unique piRNAs originating from the same single gene. We also evidence phased piRNA biogenesis from genes and lncRNAs, which is initiated by trigger-piRNAs that are not related to nowadays known TEs and originate from other genes and lncRNAs at distant loci. When examined across distant species, we further bring evidence for a new type of regulatory pathway, named snetDNA-pathway, conserved throughout evolution and based on particularly short genomic sequences, the snetDNAs ("small-RNA-network DNA"), which establish regulatory networks through piRNA targeting and phased piRNA biogenesis.

Mapping Small RNA Reads
Reads were mapped against AgamP4.2 and AaegL3, genome assemblies for An. gambiae and Ae. aegypti respectively, allowing zero mismatch, using sRNAPipe [27]. Small RNAs of 24-29 nt length were considered as putative piRNAs. They were mapped against transcripts or TEs using scripts and pipelines based on Burrows-Wheeler Aligner (BWA) [28], Bowtie2 [29] or NucBase [30]. RPM (reads per million) for each sequence were calculated as the number of reads per million genome-mapped reads allowing no mismatch. Multi-mapping reads were randomly assigned to the mapping sequences. Transcript-mapping reads were identified by alignment to protein-coding transcripts from genome assemblies AgamP4.2 (An. gambiae) and AaegL3.2 (Ae. aegypti), available at VectorBase [31], and to lncRNAs [32]. TE-matching reads were identified by alignment to all TE sequences for the subphylum Hexapoda that are available on RepBase (http://www.girinst.org) [33], allowing up to 3 mismatches.
Bona fide reads were defined as small RNAs which do not map microRNAs (miRNAs), ribosomal RNAs (rRNAs), small nuclear RNAs (snRNAs) or transfer RNAs (tRNAs). "Collapsed" counts were determined from collapsed libraries where all identical reads were reduced to one single read. "Non-collapsed" counts originate from the initial sequencing libraries. "Multiplicity" was calculated as the ratio of the non-collapsed count of piRNAs over the collapsed count (Table S1).
The application "Tablet" was used for visual exploration of piRNA mapping [34].

Analyses of Ping-Pong Signature
Ping-pong signature was analysed essentially as in Grentzinger et al. [35]. The putative piRNAs, 24-29 nt in length, were mapped to the transcripts as above. Their 5 -positions on the mapped sequences were determined from the resulting sam files by a dedicated script, available upon request. The 5 -overlaps between sense-oriented piRNAs and antisense-oriented piRNAs were calculated for all opposite piRNAs with a maximal distance of 24 nt between their respective 5 -positions. This maximal overlap of 24 nt was chosen according to the size of the considered piRNAs, 24-29 nt, and corresponds to the maximal overlap which is possible for all considered piRNAs to avoid the bias observed above this limit. We concluded for a significant 10-nt 5 -overlap when its z-score was >1.96, within the window of 5 -overlaps from 1 to 24 nt (including 10-nt 5 -overlaps), at the condition that the total number of overlapping piRNA pairs within this window was at least 30 (for any overlap length).

SnetDNA Consensus Sequences
To build snetDNA consensus sequences, the genomic sequences that aligned to 2R_lncs_14 and AGAP011923 snetDNAs and flanking sequences (BLASTN E = 10As described in "BLASTN analyses") were extracted from the respective genomes (An. gambiae AgamP4, Ae.aegypti AaegL3, mouse GRCm38.p5 and human GRCh38.p8) and aligned using "Muscle" and "ClustalW" multiple sequence alignment tools included in MacVector 17.0.5.

snetDNAs and Genomic Annotations
We extracted gene annotations from GENCODE basic annotations v28 for the human genome (GRCh38.p12) as well as GENCODE basic annotations v19 for the mouse genome (GRCm38.p6) [39], downloaded from the GENCODE ftp site. For each gene, we selected its longest transcript to define its exons/introns structure. In addition, we extracted repeat annotations predicted with RepeatMasker (Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-4.0. 2013-2015, http://www.repeatmasker.org) for human and mouse and available from the UCSC portal [40]. To better characterize the genomic locations of snetDNAs, we tested the overlap between snetDNAs and several genomic features using the GenometriCorr package [41]. The GenometriCorr package implements spatial tests of independence between two sets of genomic intervals. We used the Jaccard test to compare observed overlaps to a null distribution based on permutations of the snetDNAs intervals across the genome.

Expression of Genes Containing snetDNAs
We investigated expression enrichment towards particular tissues amongst protein-coding genes containing a snetDNA, using the TissueEnrich package [42]. TissueEnrich defines tissue-specific genes using RNA-Seq data in 35 human tissues from the Human Protein Atlas [43] and in 17 mouse tissues from the Mouse ENCODE Dataset [44]. The package then provides a test for significant enrichment of tissue-specific genes towards particular tissues, using a hypergeometric test with the Benjamini and Hochberg correction for multiple testing.
The scripts are available upon request.

Accession Numbers
All small RNA sequencing data used here are available at the NCBI SRA database or NCBI GEO DataSets. They have been published by us or by others and are publicly available. The accession numbers are given in the Section 2 together with the corresponding reference.

Genes May Give
Rise to piRNAs Implicated in Ping-Pong Amplification in An. gambiae Ovaries Small RNAs from An. gambiae ovaries show a bimodal length distribution with one peak at 22 nt attributed to microRNAs and a broad peak spanning 24-29 nt [21]. Here, we analysed only small RNAs of 24-29 nt length, the putative piRNAs.
We focused on piRNAs mapping to protein-coding genes without mismatch. We found that 870 mRNAs map >3 Reads Per Million (RPM) of piRNAs. piRNAs mapping to these 870 mRNAs make up 10.4% of all genome-mapping piRNAs. Among the mRNA-mapping piRNAs, 93.3% is genome-unique, i.e., they map only once in the An. gambiae genome. The majority of the 870 mRNAs produce only sense piRNA reads (564 transcripts), four mRNAs produce only antisense, and 302 mRNAs map both sense and antisense piRNA reads ( Figure 2A).
When both sense and antisense piRNAs are produced from the same mRNA, there is a possibility of ping-pong amplification [1]. To test whether genic piRNAs may be amplified by the ping-pong mechanism, we searched sense and antisense piRNAs that have a 5 -overlap of 10 nt (based on their 5 -end positions when mapped to the transcripts; see the Section 2). We found a significant overrepresentation of 10-nt 5'-overlaps for 97 transcripts (Figure 2A-C and Table S1). The median amount of piRNAs is 2-times higher for these 97 transcripts than for the 773 piRNA-mapping transcripts without enriched 10-nt 5'-overlaps (Supplementary Figure S1). The piRNAs from these 97 transcripts also show significant 1U and 10A overrepresentation ( Figure 2D,E). Actually, 55% of the 97 mRNAs have both 1U-biased (>50% 1U) antisense piRNAs and 10A-biased (>50% 10A) sense piRNAs and 32% have 1U-biased sense piRNAs and 10A-biased antisense piRNAs ( Figure 2D and Table S1). On the opposite, antisense-oriented piRNAs mapping mRNAs without enriched 10-nt 5'-overlaps show only 1U bias but no 10A enrichment, and their sense piRNAs are neither 1U-nor 10A-enriched. This is true for both non-collapsed and collapsed libraries ( Figure 2E). The latter unexpected data reveal a new category of small RNAs, which might be specific to mRNAs, 24-29 nt long, without any 1U or 10A signatures.
Until now, ping-pong amplification of endogenous piRNAs was known for TEs. Thus, we checked whether the ping-pong observed for 97 mRNAs might be due to the presence of TE-homologous sequences. We tested whether the piRNAs with a ping-pong partner (with 10-nt 5 -overlap), the "ping-pong-piRNAs", map TEs. We found only four transcripts with a high proportion of ping-pong-piRNAs mapping TEs ( Figure 2F and Table S1). Consequently, ping-pong for mRNAs is mostly unlinked to the presence of TE-related sequences. We then checked whether the ping-pong-piRNAs map to other repeated sequences in the An. gambiae genome. The results show that, for 82 out of 97 transcripts, the ping-pong-piRNAs map to repeats other than TEs. For the remaining 11 transcripts, all ping-pong-piRNAs map to genome-unique sequences. Red spots: mRNAs with significant 10-nt 5 -overlap ping-pong signature. Grey spots: mRNAs without ping-pong signature. (C) A zoom-in from the scatterplot in Figure 2B. (D) Heat map of 1U and 10 A percentages for sense and antisense (as) bona fide piRNAs mapping the 97 mRNAs that show significant enriched 10-nt 5 -overlap. (E) Percentages of 1U and 10A signatures from non-collapsed (NC) and collapsed (coll) libraries (see the Section 2) for all bona fide piRNAs (left) and for piRNAs from protein-coding genes, except AGAP003387, with or without significant ping-pong signature ("PP" or "noPP", respectively). (F) For the 97 mRNAs in Figure 2A, the heat map shows percentages of piRNAs with ping-pong partner among sense and antisense (as) piRNAs (PP); percentages of piRNAs that map insect TEs, allowing 0-3 mismatches, among piRNAs with ping-pong partner (TE-PP); and percentages of piRNAs that map repeated sequences in the genome among piRNAs with ping-pong partner (rep-PP). (G,H) Principal component analysis.
To extract the important information from our data concerning piRNAs mapping protein-coding transcripts (Table S1), we performed a Principal Component Analysis (PCA) [45] ( Figure 2G; individuals' coordinates and variables' contribution are given in Table S2, a correlogram for the PCA variables is shown in Supplementary Figure S2, and the corresponding raw data are presented in Tables S3 and S4). Bona fide piRNAs, defined as all small RNAs 24-29 nt in length which do not map miRNAs, rRNAs, snRNAs or tRNAs (see also the Section 2), and genome-unique piRNAs were considered separately. The variables that had the highest contribution to variance were the z-score and the percentage of 10-nt 5 -overlaps, for bona fide piRNAs (BZ and BpctPP respectively) and for genome-unique piRNAs (UZ and UpctPP respectively), the ratio of bona fide sense over antisense piRNAs (LOGratioBsBas), and the percentage of genome-repeated piRNAs (pctrepeats). The PCA distinguishes essentially three groups of transcripts mapping piRNAs: mRNAs without significant ping-pong, mRNAs mapping repeated or TE-matching piRNAs doing ping-pong, and mRNAs mapping genome-unique piRNAs doing ping-pong ( Figure 2H).
As already mentioned above, eleven transcripts achieve ping-pong amplification with genome-unique piRNAs (Table S1). These 11 mRNAs originate from 8 different genes. Bar plots of mapping piRNAs illustrate the high proportion of piRNAs with ping-pong partners originating from the same gene ( Figure 3). We checked whether piRNAs from other loci may also target these transcripts by mismatched pairing (allowing up to 5 mismatches) and may trigger ping-pong amplification. We found that such a "trans-ping-pong" [4] is possible for these transcripts but that piRNAs with ping-pong partners originating from one same gene are the most abundant ( Figure 3 and Table S5). Thus, our data provide the first evidence that ping-pong amplification may be achieved with genome-unique piRNAs originating from one same single genomic sequence. We propose to call this ping-pong amplification "intrinsic ping-pong".

piRNAs from Genomic Repeats May Fuel mRNAs from Distant Loci into the Ping-Pong Cycle of piRNA Amplification and Create "Ping-Pong Networks"
We then focused on the group of mRNAs mapping repeated ping-pong-piRNAs. The aim of this study was to find out whether piRNAs originating from one gene may target transcripts of other genes and engage them in ping-pong amplification. We searched for ping-pong-piRNAs that may target several genes. We identified 58 mRNAs forming 17 different groups, the mRNAs of each group being targeted by the same ensemble of piRNAs. These piRNAs originate from a repeated sequence, which is present in all transcripts of the same group. Actually, these 17 groups represent what we named "ping-pong networks" (Table S6). To check effective piRNA production for the 58 transcripts of the 17 putative ping-pong networks, we searched for genome-unique piRNAs mapping these transcripts. There may be spreading to the flanking sequences or erroneous annotation of the transcripts omitting part of the UTRs, as has been observed for AGAP003266-RA (see Gomez-Diaz et al. [46] and below). Thus, we also analysed the flanking sequences, which could in fact produce piRNAs. We found that 32 of the 58 transcripts match >0.5 RPM genome-unique piRNAs and that, when including flanking sequences (5 kb up-and downstream), 52 of the 58 transcript loci match >0.5 RPM genome-unique piRNAs. These results indicate that at least 52, i.e., 90%, of the 58 ping-pong network transcripts correspond indeed to piRNA producing loci (Table S7). To identify the repeats which are common to the transcripts of each ping-pong network, we made multiple sequence alignments and built the corresponding sequence logos (Supplementary Information SI-MSA1). The results evidence different repeats of 130 to 1130 nucleotides. These repeats and the piRNAs produced from the ping-pong networks do not contain any low complexity sequences (e.g., repeats of di-or trinucleotides). Only one of the two repeats of the ping-pong network 12, repeat A, contains several grouped cytosine-adenine-guanine (CAG) triplets (see Supplementary Information SI-MSA1), but these sequences do not match any piRNAs. Consequently, there is no bias of piRNA matching due to low complexity repeats.
Our data show that piRNA biogenesis may occur by ping-pong amplification between mRNAs from a group of genes that contain related repeats. Because mRNAs are sliced when processed to piRNAs, this mechanism may constitute a basis for concerted gene regulation within each ping-pong network. Normalized counts (RPM) of genome-unique (blue) and repeated (red) piRNAs for all mapping piRNAs (top) and for piRNAs having ping-pong partners (second, third and bottom rows): In the first and second rows, only piRNAs matching the transcripts without any mismatch ("0 mm") are shown; in the third row, sense piRNAs matching the transcripts without mismatch together with mismatched (1 to 5 mismatches, "1-5 mm") antisense ping-pong partners are shown; and in the bottom row, antisense piRNAs matching the transcripts without mismatch together with mismatched sense ping-pong partners are shown.

piRNAs from Genes and lncRNAs May Direct Other Transcripts to Phased piRNA Biogenesis, Thus Creating Networks of Inter-Regulation
In the Drosophila germline, TE-derived piRNAs, the "trigger-piRNAs", may target genic mRNAs and lead to piRNA-guided slicing of the latter and phased piRNA biogenesis [5,6]. This process is different from ping-pong by the fact that it does not lead to a piRNA amplification loop but to the production of sense-oriented phased "responder-" and "trail-piRNAs" [5] (Figure 1). We decided to examine whether piRNAs which originate from genic transcripts or lncRNAs may be complementary to other transcripts and may initiate their phased slicing. To achieve this, 24-29 nt reads were mapped against the genome allowing no mismatch and only genome-unique reads (that map only once to the genome) that match mRNAs or lncRNAs were considered for further analyses. This has the advantage that the origin of these piRNAs is clearly known. To find potential targets, these piRNAs were aligned in a reverse complementary manner, allowing 1 to 5 mismatches, to the 664 mRNAs and 20 lncRNAs that match >3 RPM genome-unique sense-oriented piRNAs without mismatch (Table S8). We identified 310 transcripts from 282 genes and 5 different lncRNAs that may be targeted by >3 RPM piRNAs (Table S8). In Drosophila, trigger-piRNAs show a precise 10-nt 5 -overlap to the responder-piRNAs that are produced by slicing of the target transcript ( Figure 1) [5,6]. Thus, we searched for pairs of putative trigger-and responder-piRNAs showing the typical 10-nt 5 -overlap. We identified a total of 209 RPM trigger-piRNAs, all genome-unique, and 88,079 RPM-corresponding responder-piRNAs, 88,069 RPM of them genome-unique and 10 RPM repeated twice in the genome, originating from five different transcripts (three mRNAs and two lncRNAs). They correspond to 12 trigger/responder pairs, constituting 12 different putative trigger events (Table S9). These trigger events result in a network shown in Figure 4A. It shows where the trigger-piRNAs originate from and which transcripts they target, together with the amounts of the respective piRNAs. Complementarity of the trigger-piRNAs to the targeted transcripts (mismatched) and of all piRNAs originating from the targeted transcripts (no mismatch), including responder-and trail-piRNAs, are shown in Figure 4B-E and in Supplementary Figure S3. Notably, the same trigger-piRNAs may target both different transcripts and several sites within the same transcript.
A first set of trigger events involves AGAP011923 and AGAP003266 transcripts ( Figure 4A). Trigger-piRNAs from AGAP011923 induce responder-piRNA slicing at 2 different positions within the 3 UTR of AGAP003266 (reported in Gomez-Diaz et al. [46]) (Supplementary Figure S3C,D). In this particular case, the responder-piRNAs from AGAP003266 may target back the AGAP011923 transcript from which the corresponding trigger-piRNAs originate. This process results in trans-ping-pong amplification [4] between the transcripts AGAP011923 and AGAP003266 ( Figure 4A). It is noteworthy that the targeting piRNAs from AGAP011923 and AGAP003266 all have 1U and 10A (Table S9). Trail-piRNAs are found downstream of each of the three trigger/responder-piRNA peaks (two for AGAP003266 and one for AGAP011923), indicating that these trigger events induce effective phased piRNA biogenesis (Supplementary Figure S3A,C). These data indicate that responder-piRNAs may actually become trigger-piRNAs targeting back the sequence that gave the initial trigger-piRNAs.
Trigger-piRNAs from AGAP011923 also induce phased piRNA biogenesis by slicing of the lncRNA 2R_lncs_14. The resulting trail-piRNAs from 2R_lncs_14 include 54.6 RPM piRNAs that are complementary to an unannotated RNA that we named unk_RNA_1 ("unknown RNA 1") ( Figure 4A). We aimed to establish whether these trail-piRNAs could act as trigger-piRNAs. We identified 7.3 RPM-corresponding responder-piRNAs from unk_RNA_1 ( Figure 4A and Supplementary Figure S3E,F). To know whether there may be ping-pong between 2R_lncs_14 and unk_RNA_1, we looked whether unk_RNA_1 responders are complementary to the 2R_lncs_14 transcript. We found them complementary with 5 mismatches over 25 nt, but they do not match at the position of the initial trigger-piRNA within 2R_lncs_14. They match 113 nt downstream of it ( Figure 4B). Thus, there is no ping-pong between these two transcripts. The targeting of the downstream position by the unk_RNA_1 piRNAs induces slicing of 2R_lncs_14 and leads to production of 1239.4 RPM responder-piRNA. These results indicate that both trail-piRNAs and responder-piRNAs can become trigger-piRNAs for piRNA-guided slicing. This leads to an amplification of piRNA processing by a process of back-targeting, which differs from a ping-pong loop if the back-targeted sequence is different from the initial trigger-producing site.
In all cases, targeting by trigger-piRNAs very efficiently induces phased piRNA biogenesis. Relatively few trigger-piRNAs induce slicing of high amounts of responder-piRNAs, especially in the case of AGAP003387. Indeed, AGAP003387 is targeted by 63.9 RPM trigger-piRNAs from AGAP011923 and is sliced into 86624 RPM responder-piRNAs and 1767 RPM trail-piRNAs, totaling 9% of all genome-mapping piRNAs in the An. gambiae ovary [21] (Figure 4A,D,E).
In conclusion, in An. gambiae ovaries, piRNAs from genes can be engaged in a regulation network involving mRNAs and lncRNAs from distant loci. The latter produces trigger-piRNAs inducing piRNA-guided slicing of other unrelated mRNAs or lncRNAs. The resulting responder-piRNAs and trail-piRNAs may themselves become trigger-piRNAs extending the network. Targeting by trigger-piRNAs leads to slicing and thus to the decay of the targeted transcripts.

Genomic Sequences Producing piRNAs Engaged in Networks are Repeated and Conserved from Insects to Mammals
To find out whether similar piRNA networks exist in other species, we aligned the sequences of the five network transcripts from An. gambiae to 38 disease vector genomes (mosquitoes, flies, arachnidae and snail), to Drosophila melanogaster and to mammal genomes (human, mouse and rat) by BLASTN (Basic Local Alignment Search Tool for Nucleotide sequences).
We found that the five network transcripts were conserved either over their whole length (2R_lncs_14, AGAP003387 and AGAP003266) or at least over 80% of their length (AGAP011923 and unk_RNA_1) in the six species of the An. gambiae complex ( Figure 5A-E, Supplementary Figure S4 and Table S10). In the other species, short motifs encompassing approximately 40 nucleotides were conserved across multiple genomes. Strikingly, the majority of these motifs correspond to the sequences encompassing the trigger-piRNA annealing site and the responder-piRNA sequences, which we identified as snetDNA for "small-RNA-network DNA". These sequences were found conserved not only in the 21 Anopheles species but also in the other 22 analysed species. Moreover, the snetDNA sequences were highly repeated. In Culex quinquefasciatus, the snetDNA positioned between 996-1032 in 2R_lncs_14 was found repeated more than 200 times (e-value < 1, conservation >30 nt), 193 copies being clustered together in the same locus of 10.5 kb, organized in tandem repeats of 54-nt length (Table S11). We also found that many of the BLASTN matches hit annotated transcripts ( Figure 5F and Table S12). Moreover, as annotated transcripts do not include all non-coding RNAs, the number of hits in transcripts is probably largely underestimated.
We determined consensus sequences of the best-conserved snetDNAs in An. gambiae, Ae. aegypti, mouse and human and found them highly similar in these evolutionary very distant species of mosquitoes and mammals ( Figure 5G-J and Supplementary SI-MSA2). In all four species, the snetDNAs homologous to 2R_lncs_14 contain a conserved central motif CACTNAAATTTCTTTN(A) 3-6 TTAA. The AGAP011923-RA-related snetDNAs have CANANTTATCAGAAATTTAAGTG in their center. In each case, the 5 -end of the putative responder-piRNA corresponds to a T-rich sequence, TTTCTTT for 2R_lncs_14-related snetDNAs and TTATCAG for AGAP011923-RA-related snetDNAs, with a highly conserved T corresponding to the 5 1U of the putative responder-piRNA. Our data suggest that the piRNA-mediated regulation network of transcripts based on repeated snetDNAs is largely conserved in the animal kingdom.  Table S10). The counts of hits are plotted as sliding windows, corresponding to each similar sequence along the transcript for each individual species. Zero hit is in light grey, and one single hit in the lightest orange. Light orange along the whole transcript indicates existence of a genomic sequence, which is similar all over the transcript length (see also Table S10). This situation is observed in the six species of the An. gambiae complex. On the left, a color code for different categories of species is presented. Mosquitoes are in different blue colors: An. gambiae complex, light blue; other Anopheles, cadet blue; Aedes, cornflower blue; and Culex, blue. Coral is flies; gray is tsetse; green is mammals; and dark golden is other species, including snail, body louse, bed bugs, tick, mite and sand flies. Horizontal bars above the heat map indicate the trigger-responder sites (spanning over the trigger-annealing site and the corresponding responder-piRNAs). Below each heat map is a graphical representation of the totalized counts of hits for all analysed species mapped for each position of the transcript. The detailed BLASTN results with the aligned sequences can be found in Table S10 (for A-E) and Table S12 ( Although well investigated nowadays, lncRNA function remains elusive. It is very interesting to observe that the lncRNA 2R_lncs_14 displays the most conserved sequence of the network. Actually, it is conserved over its whole length in the six species of the An. gambiae complex, and its snetDNAs are conserved in all 43 analysed species ( Figure 5A and Supplementary Figure S4A) with sequence identity ranging from 81% to 97.5% over 35-45 nucleotides (Table S10). Sequences homologous to 2R_lncs_14 snetDNAs are often found clustered together in the same locus with three to nine snetDNAs identified by BLASTN distributed over less than 4 kb (Table S11). This configuration resembles the one of 2R_lncs_14 ( Figure 4B). In multiple cases, the conserved sequence at positions 1408-1440 of 2R_lncs_14 was also present in these clustered snetDNA loci, suggesting that it is another conserved snetDNA (Figures 4B and 5A and Table S11). We checked whether this region of 2R_lncs_14 may be targeted by putative trigger-piRNAs and identified such targeting piRNAs as well as the corresponding responder-piRNAs located at positions 1407-1435 ( Figure 4B and Table S9). Eleven loci with clustered snetDNAs were identified in Ae. aegypti, and ten were identified in Aedes albopictus. At least one such locus with 2 to 9 snetDNAs could be identified in each of the analysed Anopheles, Aedes, and Culex species, with the exception of An. maculatus in which only non-clustered snetDNAs could be identified by BLASTN. We propose that these loci with clustered snetDNAs are orthologs of 2R_lncs_14.

SnetDNAs from the Aedes aegypti Mosquito Produce piRNAs and Establish Networks
An important question is whether the conserved snetDNAs produce piRNAs and establish "snetDNA networks" in mosquito species other than An. gambiae. To answer this question, we analysed small RNAs that had been sequenced from Ae. aegypti adults [24].
In Ae. aegypti, we found piRNAs similar to AGAP011923 trigger-piRNAs (Supplementary Figure S5). They originate from two distant loci, the annotated gene AAEL017228 (AAEL027227 in the AaegL5 assembly) and a region downstream of AAEL009512, that may be in fact its extended 3 -UTR ( Figure 6A). These putative trigger-piRNAs differ by only 2 nucleotides and might thus target the same transcripts. Notably, AAEL017228-RA is the transcript which has the most mapping piRNAs among all annotated transcripts in Ae. aegypti. It also maps >3 times more piRNAs than the AAEL017385 gene which contains tapiR1 and tapiR2, two highly abundant piRNAs in Ae. aegypti, conserved for approximately 200 million years, with biological functions in control of gene expression and embryonic development [47]. Interestingly, the putative AAEL009512 3 -UTR has four identical piRNA-producing sequences (Supplementary Figure S6). These sequences are unique to this region, i.e., they do not match other genomic regions and they are clustered together over less than 4 kb ( Figure 6B), as is observed for snetDNAs within 2R_lncs_14 and its putative orthologs ( Figure 4B and Table S11) or for AGAP003387 in An. gambiae ( Figure 4D).
We also identified piRNAs similar to An. gambiae 2R_lncs_14 responder-piRNAs in Ae. aegypti. They have 10-nt 5'-overlap to the putative trigger-piRNAs from AAEL017228 and AAEL009512 and originate from previously identified 2R_lncs_14 orthologs (see above). Actually, the eleven 2R_lncs_14 orthologs identified in Ae. aegypti (Table S11) produce a large amount of piRNAs, 2.6% of all piRNAs. Each of these orthologs also gives genome-unique piRNAs in addition to non-unique piRNAs, attesting that they all produce piRNAs. The Ae. aegypti snetDNA network is shown in Figure 6A, and alignments of trigger-and responder-piRNAs to the implicated transcripts are shown in Figure 6B-F and Supplementary Figure S6. We performed BLASTN of putative trigger-and responder-piRNA sequences against the Ae. aegypti genome. We found that trigger-piRNAs from AAEL017228 are genome-unique while trigger-piRNAs from the AAEL009512 3 -region map to 4 sites clustered together and are unique to this region. On the other hand, the piRNA sequences similar to An. gambiae 2R_lncs_14 responder-piRNAs were found highly repeated with a total of 74 genomic sites. All except two are localized in the eleven identified 2R_lncs_14 orthologs (Tables S11 and S13). The Ae. aegypti 2R_lncs_14 ortholog with 9 snetDNAs identified by BLASTN produces the highest amount of genome-unique piRNAs, 2824.9 RPM. Interestingly, it overlaps two annotated protein-coding genes in the same orientation, AAEL011224 and AAEL018213. When aligned to piRNAs allowing up to 6 mismatches, we identified a total of 17 snetDNAs, targeted by trigger-piRNAs from AAEL017228, AAEL009512 and AAEL002913 3 -regions, in this 2R_lncs_14 ortholog spanning over 4.6 kb ( Figure 6B,C). Responder-piRNAs from the AAEL002913 3 -region (annotated as AAEL025618-RA transcript in AaegL5) may target back Ae. aegypti 2R_lncs_14 orthologs indicating that, in Ae. aegypti, as in An. gambiae, responder-piRNAs may become trigger-piRNAs ( Figure 6A  Trigger-piRNAs from AAEL017228 and the AAEL009512 3 -region target transcripts from 2R_lncs_14 orthologs, Aaeg_unk_RNA_1 (non-annotated) and the AAEL002913 3 -region. The resulting responder-piRNAs from the AAEL002913 3 -region target back 2R_lncs_14 orthologs, leading to trans-ping-pong amplification. Arrows indicate one-way targeting (blue) or trans-ping-pong (orange). Bars (color legend on the right) indicate percentages of trigger-and responder-piRNAs for each transcript (all mapping reads). Trigger/responder-piRNAs may function both as trigger-or responder-piRNAs.

The Characteristics of snetDNAs in Mouse and Human Suggest that snetDNAs Derive from Ancient Transposable Elements
We analysed small RNAs that had been sequenced from mouse fetal and adult testes [25,26]. We found piRNAs that are similar to An. gambiae and Ae. aegypti network piRNAs (Supplementary Figure S5). These piRNAs match putative mouse snetDNAs determined by BLASTN (in Figure 5A-E and Table S10) either without any mismatch indicating production of piRNAs from these snetDNAs (fetal mouse piRNAs in Supplementary Figure S5) or when allowing up to 4 mismatches indicating possible targeting by piRNAs (adult mouse piRNAs in Supplementary Figure S5).
We analysed the localisation of mouse and human snetDNAs, found by BLASTN analyses (in Figure 5A-E and Table S10) with respect to annotated genes and TE-related sequences. Mouse snetDNAs significantly overlap with genes, long intergenic non-coding RNAs (lincRNAs), LINEs (Long Interspersed Nucleotidic Elements) and DNA repeats. Human snetDNAs overlap with genes, protein-coding genes, introns and DNA repeats (Table S14). In mouse and human, the snetDNA containing TEs belong to the haT transposons of the Charlie subfamily in mouse and of the Charlie/MER and Tigger subfamilies in human. haT transposons represent ancient, inactive transposons [48]. The significant overlap of mouse and human snetDNAs with TEs suggests that snetDNAs in fact derive from TEs.
Genes containing snetDNAs are mostly expressed in multiple or all tissues, but some of these genes are specifically expressed in male gonadal tissue or brain. In fact, from 54 snetDNA containing genes in human, 21 are tissue-specific by TissueEnrich (Table S15). From the 21 tissue-specific genes, 7 are specifically expressed in testes and 8 are specifically expressed in the cerebral cortex. In mouse, out of 25 snetDNA containing genes, 8 are tissue-specific. Two of the latter are specifically expressed in testes, and six are expressed in brain tissues. Tissue-specific expression of snetDNA-containing genes in testes and brain is remarkable because these tissues express piRNAs [14,[49][50][51] making piRNA processing from the transcripts of these genes possible.
Our data show that snetDNAs are evolutionarily conserved in quite distant species, from insects to mammals, and probably derive from TEs. SnetDNAs may produce piRNAs and/or be targeted by piRNAs and thus be at the origin of regulation networks by RNA degradation, as shown here for An. gambiae and Ae. aegypti.

Discussion
Here, we extended the knowledge on biogenesis of piRNA from genes and lncRNAs in An. gambiae. These piRNAs originate not only from ping-pong amplification between genic transcripts but also from phased biogenesis initiated by trigger-piRNAs originating from distant loci. The majority of piRNAs from genes and lncRNAs in An. gambiae show no similarity with known TEs. They constitute a basis for networks of concerted gene regulation, each one involving an ensemble of mRNAs that contain specific repeats, which are at the origin of piRNA production. Two different types of networks are observed: (a) "ping-pong networks" grouping transcripts containing the same repeated sequences, 130 to 1130 base pairs long, producing piRNAs which integrate the ping-pong amplification cycle, and (b) "snetDNA networks" grouping transcripts containing snetDNAs, short repeats of size between 25 and 42 nucleotides that produce trigger-and responder-piRNAs which are at the origin of concerted phased piRNA biogenesis. In the latter network, transcripts producing trigger-piRNAs target other complementary transcripts and induce their slicing to responder-and trail-piRNAs. Responder-and trail-piRNAs may then become trigger-piRNAs targeting other complementary transcripts. If the responder-piRNAs are complementary to the initial trigger-producing transcript, they may target back the latter and induce piRNA amplification between these two transcripts either by trans-ping-pong or phased piRNA biogenesis.
For the "ping-pong networks", we observed piRNA ping-pong amplification between transcripts from distant loci having the same repeats. In addition, for 11 other mRNAs, we identified a novel "intrinsic ping-pong amplification", which is achieved with genome-unique piRNAs originating from the same single gene. Until now, ping-pong amplification had been observed between piRNAs from different loci, i.e., piRNA clusters on one hand and functional TEs on the other hand. Here, we identified 11 transcripts from 8 genes, which produce both sense and antisense ping-pong partners by their own without the need for targeting piRNAs from other loci. An open question now is how these genes become autonomous piRNA-producing loci.
In snetDNA networks, we observed that relatively few trigger-piRNAs may induce slicing of high amounts of responder-piRNAs, especially in the case of AGAP003387. This suggests that, as observed in C. elegans, piRNA concentration does not limit targeting but probably correlates better with binding energy than with piRNA abundance [52].
Our data give insights into the mechanisms of target slicing. Ago3-bound piRNAs in Drosophila have enriched 10A while Aub-bound piRNAs are enriched for 1U only [1]. In the networks described above, 96.5% of the trigger-piRNAs have both 1U and 10A signatures whereas 98.4% of the responder-piRNAs have 1U but no 10A (Table S9). Assuming that 1U and 10A signatures of Ago3-and Aub-bound piRNAs have the same specificity in Drosophila and Anopheles, even if this still has to be tested, we hypothesize that trigger-piRNAs of the An. gambiae snetDNA network are essentially Ago3-bound and responder-piRNAs Aub-bound.
In Drosophila, it is known that responder-piRNAs are essentially bound onto Aub with a minority of them bound onto Ago3 [5]. Han et al. [6] observed that RNAs cut by Ago3 produce phased Aub-bound piRNAs but that RNAs cut by Aub do not produce phased Ago3-bound piRNAs. In the An. gambiae snetDNA network, trigger-responder pairs are not necessarily followed by phased trail-piRNAs (AGAP000203 and AGAP013373 in Figure 3 and 2R_lncs_14 responder-positions 929 and 1122 in Figure 4B). Also, responder-and trail-piRNAs of the network may become trigger-piRNAs. This raises the question to which proteins these piRNAs may be bound. Our observations suggest that, in An. gambiae, trigger-and responder-piRNAs may be bound either to Aub or Ago3 orthologs, like in Drosophila, and that slicing of phased trail-piRNAs may depend on the protein that charged the trigger-piRNAs.
In the Drosophila melanogaster germline, phased piRNA biogenesis depends on the targeting of transcripts by trigger-piRNAs. Pairing between the trigger-piRNAs and the targets supports some mismatches [5]. We assumed that this is the same in the An. gambiae germline. In mouse, some authors suggested that slicing by PIWI argonautes, like Aub and Ago3, requires perfect matching between the transcript and the trigger-piRNA at positions 2-11, the "seed" [53], while others observed slicing of the targeted transcript even with 3 mismatches over positions 1-20 of the trigger-piRNA [54]. It is not known whether the strict pairing within a seed sequence is required for PIWI slicing in An. gambiae. In the An. gambiae ovary, we observed typical genome-unique responder-and phased trail-piRNAs originating from the network transcripts and identified corresponding putative trigger-piRNAs. From 12 different trigger events implicated in the snetDNA network in An. gambiae, only 5 show a perfect match over the whole putative seed, positions 2-11 (Supplementary Figure S3). For the rest of the trigger-piRNAs, perfect matching is observed in all cases around the slicing positions 10-11, while 1 to 3 mismatches are observed within the putative seed. We hypothesize that the strict pairing within the seed is not necessarily required for the slicing by the PIWI protein(s) involved in the network systems of An. gambiae. An. gambiae has three PIWI class proteins: Ago3 (AGAP008862), AGAP011204 and AGAP009509 [55]. We propose that at least one of these proteins does not need necessarily strict matching within the whole 2-11 seed for slicing activity but uses a smaller seed or allows some relaxed matching within the seed. We also observed that, even when base pairing starts at position 2 or 3, the slicing of the target always occurs between the nucleotides 10And 11 relative to the piRNA 5 end. This is consistent with what has been observed in mouse testes [54].
Our present data suggest that An. gambiae mRNAs and lncRNAs are processed into piRNAs that are able to target other transcripts and direct them into the piRNA pathway, thus creating piRNA-mediated regulation networks. Such networks are based on the presence within the network transcripts of short repeated sequences, the snetDNAs, which are conserved from insects to mammals, notably in the 38 tested species of disease-vectors (mosquitoes, tsetse flies, sand fly, tick, mite, body louse, bed bug and snail). Due to the production of piRNAs from snetDNAs, networks of interdependent regulation are established between different genes and lncRNAs and may control their transcripts' stability/slicing within specific tissues where piRNA pathway and ping-pong amplification are known to be active. It is important to note that such regulatory pathways may function with any repeated sequences that produce piRNAs. Still, the conservation of specific snetDNA consensus sequences indicates an evolutionary and functional advantage of these specific sequences.
Targeting of transcripts by trigger-piRNAs and subsequent slicing may interfere with the expression of the targeted transcripts. AGAP003387 transcripts are part of the An. gambiae snetDNA network evidenced here. Remarkably, it has been shown that the level of AGAP003387 transcripts was increased in piRNA-pathway-impaired females, in which PIWI class proteins Ago3, AGAP011204 and AGAP009509 were knocked down. These results obtained by knocking-down independently the three different PIWI class proteins show that AGAP003387 is regulated by the piRNA pathway [23]. Our data now explain why this is the case: knock-down of these PIWI-class proteins impairs piRNA-targeting of the AGAP003387 transcript and the subsequent slicing, thus leading to higher AGAP003387 transcript levels. Taken together, these data provide evidence of the functionality of the An. gambiae snetDNA network reported here.
In mouse testes, piRNA-guided cleavage of mRNAs has been demonstrated by Zhang et al. [54]. The functional impact of piRNA-guided mRNA cleavage is further evidenced as these authors report that potential piRNA-targeted mRNAs directly interact with Miwi, the mouse Piwi protein, and show higher expression levels in the testes of Miwi catalytic mutant mice compared to wildtype. Gainetdinov et al. [56] report that the mechanism of piRNA biogenesis possibly is the same in most animals. Together with our data, these observations suggest that similar mechanisms of piRNA-mediated mRNA regulation possibly exist in most animal species.

Conclusions
In this study, we detail networks in An. gambiae and in Ae. aegypti, but it may be anticipated that many more will be discovered through further analyses. As snetDNAs, very short repeated sequences, are sufficient to establish trigger/responder-piRNA-based regulatory networks in An. gambiae, it is likely that other repeats, TE-related or not, inserted within genes may assure additional regulatory networks based on piRNA biogenesis. It is conceivable that target genes with such repeats are repressed in normal environmental conditions because their transcripts are sliced for piRNA production. Their repression may vary in a concerted manner if epigenetic events affect the piRNA pathway, contributing to a possible rapid plasticity in their expression levels.