In Silico Identification and Characterization of circRNAs as Potential Virulence-Related miRNA/siRNA Sponges from Entamoeba histolytica and Encystment-Related circRNAs from Entamoeba invadens

Ubiquitous eukaryotic non-coding circular RNAs regulate transcription and translation. We have reported full-length intronic circular RNAs (flicRNAs) in Entamoeba histolytica with esterified 3′ss and 5′ss. Their 5′ss GU-rich elements are essential for their biogenesis and their suggested role in transcription regulation. Here, we explored whether exonic, exonic-intronic, and intergenic circular RNAs are also part of the E. histolytica and E. invadens ncRNA RNAome and investigated their possible functions. Available RNA-Seq libraries were analyzed with the CIRI-full software in search of circular exonic RNAs (circRNAs). The robustness of the analyses was validated using synthetic decoy sequences with bona fide back splice junctions. Differentially expressed (DE) circRNAs, between the virulent HM1:IMSS and the nonvirulent Rahman E. histolytica strains, were identified, and their miRNA sponging potential was analyzed using the intaRNA software. Respectively, 188 and 605 reverse overlapped circRNAs from E. invadens and E. histolytica were identified. The sequence composition of the circRNAs was mostly exonic although different to human circRNAs in other attributes. 416 circRNAs from E. histolytica were virulent-specific and 267 were nonvirulent-specific. Out of the common circRNAs, 32 were DE between strains. Finally, we predicted that 8 of the DE circRNAs could function as sponges of the bioinformatically reported miRNAs in E. histolytica, whose functions are still unknown. Our results extend the E. histolytica RNAome and allow us to devise a hypothesis to test circRNAs/miRNAs/siRNAs interactions in determining the virulent/nonvirulent phenotypes and to explore other regulatory mechanisms during amoebic encystment.


Introduction
From their discovery 45 years ago [1,2], the widely conserved circular transcripts or circRNAs have been extensively studied, displaying important features such as exonucleaseresistance and ability to interact with proteins and other RNAs [3]. CircRNA molecules can be monoexonic, multi-exonic, exon-intronic, or intronic generated by co-transcriptional backsplicing, in which an upstream 3 ss acceptor is linked to a downstream 5 ss donor producing exonic circular molecules. Skipped exons within a lariat produced from linear splicing resulting may undergo post-transcriptional backsplicing originating exon-intron circRNAs (EIciRNAs) [4][5][6].
In keeping with their localization, circRNAs have different functions. Those with exonic sequences localize mainly to the cytoplasm and perform regulatory functions such as microRNAs and protein sponging, translation regulation, mRNA stability, and 5 capindependent translation [10,[14][15][16][17][18]. Intronic circRNAs localize heterogeneously within the cell. In the nucleus, they are linked to gene expression regulation, while the function of cytoplasmatic intron circles remains unknown [19,20]. Some nuclear EIciRNAs are part of the RNA polymerase II (Pol II) binding complex able to downregulate their parental genes [4,21]. This is the case of the human 2 -5 intronic circRNAs (ciRNAs), ci-ankrd52, and ci-sirt7, which directly bind to the serine 2-phosphorylated C-terminus domain (CTD) of Pol II, indicating that ciRNA-CTD interaction occurs during transcription elongation [21].
Shortly following its release from the spliceosome, a lariat becomes a circular intronic molecule after trimming off the 3 -end tail and before it is hydrolyzed by the debranching enzyme (Dbr1). However, numerous introns show cis-elements that confer them stability and accumulate inside the cell, like the stable sequence introns (sisRNAs) [19]. SisRNAs include 2 -5 covalently linked intronic circular RNAs, which are basically processed lariats, and 3 -5 covalently linked species with untrimmed 3 -end tails.
Entamoeba histolytica is the causative agent of amebiasis, a neglected tropical disease that still accounts for 55,500 deaths and nearly 2.4 million disability-adjusted life years [22]. Much effort has been devoted to understanding the mechanisms leading to infection establishment and development of disease and to treating such infections as well. In search of alternative theragnostic target molecules, we recently identified full-length intronic circular RNAs (flicRNAs) in the human protozoan parasite E. histolytica [23]. In agreement with previous reports [24], the 5 ss and 3 ss ends of these molecules are covalently linked with an average of 10 nucleotides between the branch point (BP) and the 3 ss in their lariat precursors [23]. We observed that flicRNAs are post-splicing products originating from the lariat by mechanisms not fully understood. In vivo splicing assays showed that the conserved GU-rich 5 ss is required for flicRNAs biogenesis. Furthermore, mutation of such elements decreased flicRNAs production and increased expression of their parental genes, indicating their role in the gene expression regulation [23]. Strikingly, although circular RNAs are expressed throughout the biota, they have been reported in a small number of protozoans [25]. Notwithstanding, other non-coding (nc) RNAs are expressed in Amoebozoa. For example, although not fully characterized, Mar-Aguilar et al. [26] confirmed the presence of microRNA-like molecules that had been predicted bioinformatically in E. histolytica [27]. miRNAs have been identified in Dictyostelium discoideum also [28]. Other ncRNAs in Entamoeba are the stress-induced self-circularized 5 -external transcribed spacer rRNAs in E. histolytica [29] and the encystation-related long ncRNAs in E. invadens [30].
To cope with protist and bacterial infections, recent works proposed that hosts' immune response trigger circRNAs-miRNAs-mRNA regulatory networks [31][32][33]. However, little is known about whether these regulatory networks exist in E. histolytica and if they enable the onset of parasitic infections and invasive diseases. To address this question, we mined available RNA-seq libraries in search of circular exonic transcripts with miRNA sponging potential differentially expressed (DE) in HM1:IMSS (virulent) and Rahman (avirulent) E. histolytica strains. We also searched for circular transcripts in the reptile parasite E. invadens. After validation of the software using synthetic decoy sequences with bona fide back splice junctions (BSJ), we identified 188 and 605 reverse overlapped (RO) circRNAs from the E. invadens and E. histolytica libraries, respectively. The sequence composition of the identified circRNAs was mostly exonic. A total of 416 circRNAs from E. histolytica were virulent-specific and 267 were avirulent-specific. In addition, out of the common circRNAs, 32 were DE between strains, showing no correlation between intron length or the number of introns per locus with the number of circRNAs produced in a given locus. Finally, we predicted that eight of the DE circRNAs could function as sponges of the reported miRNAs in E. histolytica, whose functions are still unknown. Our results allow us to devise a working hypothesis to test the relationships between circRNAs and miRNAs in determining virulent/nonvirulent phenotypes and to explore additional regulatory mechanisms during amoebic trophozoite to cyst differentiation.

Computational Identification of Entamoeba circRNAs
CircRNAs were identified based on backspliced RO reads as described previously [34] ( Figure S1 depicts the workflow used here). To identify the circRNAs in E. histolytica and E. invadens, available RNAseq libraries [35,36] were mined using the CIRI-full package. Respectively, six E. histolytica (run access codes ERR058005-ERR058007 for the virulent HM1:IMSS strain and ERR058008-ERR058010 for the avirulent Rahman strain) and four E. invadens (run access codes GSM4108195-GSM4108198) libraries were used, and the access codes for the reference genomes were GCF_000208925 and GCA_000330505.
Despite the robustness of the CIRI-full package [34] to ensure that it could identify circRNAs derived from an AT-rich genome such as E. histolytica, we performed a prospective run in a fraction of the libraries and found a putative circRNA from locus EHI_110790. Next, we used the exon end sequences as back splice junctions and manually added random sequences from the same locus to obtain 10 circular RNA decoys with displaced backsplice junctions. These sequences were incorporated into the partial libraries, and circRNAs were re-identified. In addition to the previous results, all decoy sequences were rescued in the analysis ( Figure S2).

E. histolytica circRNAs
In E. histolytica, 958 circRNAs were identified. After filtering for duplicates, 605 circRNAs remained. Of these, 433 were found in the HM1:IMSS virulent strain and 276 in the Rahman avirulent strain; 104 of these circRNAs were shared in both strains; therefore, 329 circRNAs were virulent-specific, and 172 were avirulent-specific. A total of 20 out of 605 circRNAs originated from intergenic regions (17 were from the virulent strain, 9 were from the avirulent strain, and they shared 6 circRNAs), and 585 circRNAs were associated with a particular locus (Table 1). Notably, only 35 of the 585 circRNAs were formed by two exons. The length distribution of E. histolytica circRNAs was limited, ranging from nearly 50 to 250 nucleotides (nt), peaking at 140 nt ( Figure 1A), and the expression of circRNAs was practically proportional to the expression of their respective linear RNAs ( Figure S3). The length distribution of E. histolytica circRNAs was limited, ranging from nearly 50 to 250 nucleotides (nt), peaking at 140 nt ( Figure 1A), and the expression of circRNAs was practically proportional to the expression of their respective linear RNAs ( Figure S3). Despite the E. histolytica libraries were previously analyzed, we wanted to ensure the detection of differential expression (DE) of circRNAs between the virulent and avirulent strains. To this end, the normalized expression of circRNAs between samples was analyzed. Box plot analysis and principal components analysis (PCA) both show that the circRNAs from the three avirulent replicas have similar distribution and variation, whereas the circRNAs from one of the virulent samples are distributed slightly different and appear to have more variation than the other two samples, while separated from the avirulent circRNAs ( Figure S4). The normalized expression counts were analyzed using DEseq2 library and the statistic p value and a fold change of 2 were determined. A volcano Despite the E. histolytica libraries were previously analyzed, we wanted to ensure the detection of differential expression (DE) of circRNAs between the virulent and avirulent strains. To this end, the normalized expression of circRNAs between samples was analyzed. Box plot analysis and principal components analysis (PCA) both show that the circRNAs from the three avirulent replicas have similar distribution and variation, whereas the circRNAs from one of the virulent samples are distributed slightly different and appear to have more variation than the other two samples, while separated from the avirulent circRNAs ( Figure S4). The normalized expression counts were analyzed using DEseq2 library and the statistic p value and a fold change of 2 were determined. A volcano plot was constructed to evidence that the circRNAs were underexpressed (green dots) and overexpressed (red dots) in the E. histolytica virulent strain with respect to the avirulent strain ( Figure 2); thus, an overexpressed circRNA in the virulent strain is underexpressed in the avirulent strain and vice-versa. Usually, circular RNA expression is lower than their linear counterparts, and for this reason, some authors prefer lower p values to assess DE [37]. However, in E. histolytica, the abundance of circRNAs expression is heterogeneous; therefore, we used two p values. Using p < 0.05, twenty DE circRNAs were identified, and twelve more were detected when a p < 0.1 was used. The plot shows the 32 DE circRNAs identified using both p values. Interestingly, three circRNAs are particularly underexpressed, all of them derived from locus EHI_169670 (a gene with three exons and two introns); one circRNA has one exon element, and the other two have two exonic elements.
in the avirulent strain and vice-versa. Usually, circular RNA expression is lower t linear counterparts, and for this reason, some authors prefer lower p values to a [37]. However, in E. histolytica, the abundance of circRNAs expression is hetero therefore, we used two p values. Using p < 0.05, twenty DE circRNAs were identi twelve more were detected when a p < 0.1 was used. The plot shows the 32 DE c identified using both p values. Interestingly, three circRNAs are particularly u pressed, all of them derived from locus EHI_169670 (a gene with three exons introns); one circRNA has one exon element, and the other two have two exonic e Figure 2. Differentially expressed E. histolytica circRNAs in HM1:IMSS strain with respe man strain. The volcano plot shows the DE circRNAs (p < 0.05 and p < 0.1 data are com circRNAs were identified: 20 overexpressed (red dots) and 12 underexpressed (green dot Next, we used the virulent-specific and the avirulent-specific circRNAs to p gene ontology (GO) analysis focusing on metabolic processes. Clearly, the metab cesses potentially regulated by the circRNAs from virulent and avirulent E. h strains substantially differ (respectively, Figure 3A,B and Tables S1 and S2). For 35 GO terms were best represented in the HM1:IMSS strain, but in the Rahman str 47. Notably, 26 of these terms were represented in both strains, but their abund significance were differentially distributed. For example, taking into account the sability of the metabolic processes only, translational elongation, ATP metab cesses, cellular nitrogen compound metabolic process, and gene expression appe ferent X coordinates between strains. Additionally, biosynthetic processes, the ge of precursor metabolites and energy, as well as purine-containing compound m processes were distributed diametrically opposed. However, as expected from m processes of the same species with different virulent attributes, the term prima bolic process showed little change between strains. This set of data suggests tha Next, we used the virulent-specific and the avirulent-specific circRNAs to perform a gene ontology (GO) analysis focusing on metabolic processes. Clearly, the metabolic processes potentially regulated by the circRNAs from virulent and avirulent E. histolytica strains substantially differ (respectively, Figure 3A,B and Tables S1 and S2). For example, 35 GO terms were best represented in the HM1:IMSS strain, but in the Rahman strain were 47. Notably, 26 of these terms were represented in both strains, but their abundance and significance were differentially distributed. For example, taking into account the dispensability of the metabolic processes only, translational elongation, ATP metabolic processes, cellular nitrogen compound metabolic process, and gene expression appear in different X coordinates between strains. Additionally, biosynthetic processes, the generation of precursor metabolites and energy, as well as purine-containing compound metabolic processes were distributed diametrically opposed. However, as expected from metabolic processes of the same species with different virulent attributes, the term primary metabolic process showed little change between strains. This set of data suggests that the DE circRNAs might be involved in gene expression regulation in the virulent and avirulent phenotypes.  When we searched for circRNAs sponging potential, remarkably, eight of the twenty DE circRNAs were predicted to have one or more Ehi-miRNA-like target sites (Table 2), suggesting the possibility of circRNA-miRNA-mRNA regulatory mechanisms existing in Entamoeba. For instance, according to their prediction score, the GRIP domain RUD3 circR-NAs 01_169670 and 18_169670 could potentially sponge Ehi-miR-18, Ehi-miR-136, Ehi-miR-160, Ehi-miR-50, Ehi-miR-88, and Ehi-miR-82, and Ehi-miR-17, Ehi-miR-18, Ehi-miR-160, and Ehi-miR-50, respectively. When we searched for circRNAs sponging potential, remarkably, eight of the twenty DE circRNAs were predicted to have one or more Ehi-miRNA-like target sites (Table 2), suggesting the possibility of circRNA-miRNA-mRNA regulatory mechanisms existing in Entamoeba. For instance, according to their prediction score, the GRIP domain RUD3 circRNAs 01_169670 and 18_169670 could potentially sponge Ehi-miR-18, Ehi-miR-136, Ehi-miR-160, Ehi-miR-50, Ehi-miR-88, and Ehi-miR-82, and Ehi-miR-17, Ehi-miR-18, Ehi-miR-160, and Ehi-miR-50, respectively.   1 The list includes those detected with p-value < 0.05 only; 2 C (circRNA)/L (linear RNA composed of spliced and unspliced variants) overexpressed (↑), underexpressed (↓) or unchanged (-); 3 DE, differential expression of the virulent HM1:IMSS strain with respect to the avirulent Rahman strain; 4 The numbers represent the Ehi-miRNAlike molecules as described by Mar-Aguilar et al. [26], possibly sponged by the corresponding circRNA, ordered according to the highest target probability; 5 Hypothetical protein; 6 Not determined/available.
Lastly, to verify that circRNAs are expressed in E. histolytica, we used RT-PCR to amplify circRNA 01_169670 (Figure 4). Sequence analysis confirmed the identity of the circRNA.   1 The list includes those detected with p-value < 0.05 only; 2 C (circRNA)/L (linear RNA composed of spliced and unspliced variants) overexpressed (↑), underexpressed (↓) or unchanged (⎯); 3 DE, differential expression of the virulent HM1:IMSS strain with respect to the avirulent Rahman strain; 4 The numbers represent the Ehi-miRNA-like molecules as described by Mar-Aguilar et al. [26], possibly sponged by the corresponding circRNA, ordered according to the highest target probability; 5 Hypothetical protein; 6 Not determined/available.
Lastly, to verify that circRNAs are expressed in E. histolytica, we used RT-PCR to amplify circRNA 01_169670 (Figure 4). Sequence analysis confirmed the identity of the circRNA.

E. invadens circRNAs
In 20 h, encystment induced E. invadens, 188 circRNAs were identified, and after duplicates elimination 143 remained. All of these circRNAs were formed of single elements, where 10 of them were intergenic, 101 were exonic, and 2 were intronic ( Table 1). The length distribution of E. invadens circRNAs was broad, ranging from nearly 130 to 710 nt, and peaked at 250 nt ( Figure 1B).
Since mature cyst formation is reached after 72 h incubation in an encystment medium, the circRNAs found in trophozoites transcriptomes were also filtered, including the intergenic circRNAs whose cell-stage expression profile cannot be univocally assigned, thus 40 cyst-specific circRNAs were identified (Tables 1 and 3). Interestingly, three circR-NAs are E. invadens-specific (the Jacob coding EIN_023210, the cadmium metallothionein coding EIN_381500, and the high mobility protein B3 locus EIN_284560), and the rest of the circRNA-producing loci have corresponding orthologs with other amoebozoans. Three circRNAs have orthologs with the free-living Mastigamoeba balamuthi (EIN_061000, EIN_369710, and EIN_475930), and fifteen have more than one E. histolytica orthologs. Remarkably, locus EIN_461630 which codes for the splicing factor U1-70 kDa, produced three different circRNAs and has two E. histolytica orthologs.  Figure 3C). Fifty-four metabolic processes are represented in encysting amoebas indicating that distinct metabolic processes are involved during cell differentiation in Entamoeba. When we compared the GO terms of 20 h cysts with the virulent strain (Table S3), again, we observed that they differed both in dispensability and distribution. For example, the metabolic process was in opposite X-axis coordinates, and the small molecule metabolic process was diametrically opposed.

Discussion
Here we bioinformatically identified circular RNA molecules from exons and intergenic regions of E. histolytica and E. invadens. Given the available suboptimal libraries used here, the number of circRNAs detected is more likely a sub-estimation. In agreement, we interpret that the lower number of circRNAs identified in E. invadens might not reflect the actual repertoire. For example, Mar-Aguilar and coworkers identified 199 Ehi-miRNA-like molecules [26]; however, when small RNA-specific libraries were analyzed, nearly half a million reads of 27nt small RNAs were identified in E. histolytica and, depending on the differentiation stage, from over 1.2 to nearly 1.9 million reads in E. invadens [38]. However, because of their structure, biogenesis, and half-life, we do not expect such high numbers for the circRNA repertoire.
Ninety-seven percent of the E. histolytica circRNAs without duplicates are exonic and constituted of a single element, and only 3% are intergenic (Table 1). We suspect that the intron mean size in the amoebic genome was the main cause for the null detection of intron-derived circRNAs. In E. invadens, in keeping with its gene structure, 6.99% intergenic and 1.3% intronic circRNAs were identified ( Table 1). The same holds true for the distribution lengths of their respective circRNAs; E. invadens circRNAs double in length E. histolytica (Figure 1).
Mostly one circRNA per locus was identified. However, 44 loci from the virulent and 28 loci from the avirulent E. histolytica strains produce two or more circRNAs and up to nine circRNAs for locus EHI_169670 (Table 2 and repository). Likewise, seven loci of encystation-specific E. invadens also produce additional (up to five) circRNAs (Table 3). Additionally, in agreement with apparent stochastic alternative 5 ss and 3 ss selection reported in the forward-splicing of E. histolytica introns [35], we observed different BSJ of some circRNAs derived from a given locus. This suggests that certain circRNAs are required in larger molar quantities to achieve their roles or that they might be involved in additional metabolic tasks.
Significantly, different from the human circRNAs [39,40], the expression of Entamoeba circRNAs positively correlates with the expression of their linear counterparts (Supplementary Figure S2). Furthermore, whereas human circRNAs originate from longerthan-average exons flanked by large introns [41,42], in both Entamoeba species, we observed no correlation between intron length or the number of introns per locus with the number of circRNAs produced in a given locus (Tables 2 and 3, repository). Altogether, these data support their suggested role in gene or protein expression regulation, as has been demonstrated in the intronic flicRNAs [23,43].
As expected, we observed the DE of circRNAs between E. histolytica virulent and avirulent strains and detected circRNAs during E. invadens cell differentiation as well ( Figure 3). Our data are in agreement with previous observations for amoebic small ncRNAs [38,44,45] and lncRNAs [30,46]. Interestingly, circRNA 14_116360, underexpressed in the virulent strain (i.e., overexpressed in the avirulent strain), originated from a locus that codes for a Serine-rich protein implicated in abiotic stress, whose phosphorylation might be controlled by ncRNAs [47]. Other circRNAs overexpressed in the Rahman strain correspond to loci coding for GRIP domain RUD3 proteins involved in the Golgi Apparatus structuring and trafficking (01_169670, 02_069670, 03_069670, 11_ 014170, and 18_069670) and to loci coding for the S20 and L27 ribosomal proteins (12_026410 and 16_183480, respectively). Therefore, one-quarter of the DE circRNAs (i.e., five out of seven of those overexpressed in the avirulent strain) originate from two loci coding for GRIP domain RUD3, suggesting that significant cell trafficking might be implicated in the control of core metabolic processes to maintain the avirulent phenotype, and probably in the virulent phenotype as well. This general feature reflects the differences in gene expression regulation required in distinct virulence and encystment phenotypes, encompassing all levels of control, from transcription to translation and protein modification.
Furthermore, the differences in GO terms for biological processes such as gene expression, translation elongation, ATP-metabolic process, cellular nitrogen and generation of metabolite precursors, and energy generation processes between E. histolytica strains and encysting E. invadens support this notion. The fact that some primary metabolic processes do not change between strains strengthens this view, indicating that not all biological processes are differentially regulated between the virulent and avirulent phenotypes.
Noteworthy is the encystment-specific circRNAs detected here ( Table 3). As expected, and casting insights of the value of E. invadens as an encystment model, three E. invadensspecific circRNAs were identified. Locus EIN_023210 codes for protein Jacob, locus EIN_284560 codes for a putative high mobility group B3 protein/SWI/SNF related chromatin-binding protein, and locus EIN_381500 codes for a putative cadmium metallothionein precursor. Jacob is the most abundant glycoprotein that has chitin-binding domains and is thus an integral part of the cyst wall [48]. The fact that three loci have Mastigamoeba balamuthi orthologs only (Table 3) reflects the deep-branching pre-parasitic origins of encystment-related circRNAs, which may be possibly involved in encystment regulation.
When we searched for anti-Eh-miRNA target sites within the identified circRNAs, we found that eight circRNAs contain at least one and as many as six target sites for Eh-miRNAs. Probably, this number is an underestimation since we used very stringent base complementary parameters. This finding suggests a sponging function of these circRNAs, a possibility that we are currently investigating. Nevertheless, because the putative sponging circRNAs are differentially expressed in virulent amoebas (circRNAs 01_169670, 14_116360, and 18_169670 are underexpressed, and circRNAs 04_130700, 06_ DS571557, 07_036530, 19_125950, and 20_ DS571344 are overexpressed) we can envisage a circRNA-Eh-miRNA-mRNA virulence/avirulence regulatory network in which, for example, the translation of a virulence-related mRNA could be repressed by an avirulent regulatory Eh-miRNA-like molecule, and such translation repression could be relieved by the overexpression of a virulent regulatory circRNA, which will titter the avirulent regulatory Eh-miRNA (or silencing RNA molecules), and vice-versa. Recently, additional small RNA and antisense Entamoeba databases have been reported [38,49]. However, a statistically significant miRNA/small RNA DE comparison between strains has proved unmanageable, masking any DE relationship between circRNA and miRNA-like molecules in these strains.
So far, regulatory networks have been predicted at the transcriptional level using the Bayesian inference [50]. Here, we propose additional regulatory networks impacting posttranscriptional expression events. Central to these regulatory networks include circRNAs that, in addition to sponging miRNA/siRNA molecules, could also interact with other ncRNAs or transcription or translation regulatory proteins and even may be translated into proteins.
In conclusion, we identified 143 and 605 reverse overlapped circRNAs from the E. invadens and E. histolytica libraries, respectively. The identified circRNAs were mostly exonic. 416 circRNAs from E. histolytica were virulent-specific, 267 were avirulent-specific, and 32 were shared between strains. From the latter, 32 were DE between strains, showing no correlation between intron length or the number of introns per locus with the number of circRNAs produced in a given locus. Finally, we predicted that eight of the DE circRNAs could function as sponges of the reported miRNAs in E. histolytica, whose functions are still unknown, although other epigenetic regulatory roles must be also explored. Our results allowed us to devise a working hypothesis to test the relationships between circRNAs and miRNA-like molecules in determining virulent/nonvirulent phenotypes and to explore additional regulatory mechanisms during amoebic encystment. Furthermore, the in silico prediction of circRNAs in this study would facilitate a deeper comprehension of regulatory epigenetic processes in such pathogenic amoebic protozoa. However, E. histolytica virulence is widely variable worldwide across several strains, and the relevant mechanisms controlling the differential virulent expression need to be investigated more prospectively.

Identification of Linear and Circular RNAs
The quality control of the RNA libraries was made using the FASTQC (v0.11.8) and Trimmomatic (v0.39) software packages. To detect and quantify the linear RNAs, we improved the STAR-HTSeq pipeline. STAR software (v2.7.10a) [51] was used to make the indexed, mapping, and annotation of linear RNAs. To quantify the expression of the linear RNAs was made by the HTSeq software with the htseq-count module. circRNAs were detected using the software CIRI-full (v2.1.1) [34,52]. For the indexing and mapping of the RNA libraries, we used the program BWA (v0.7.17), as recommended by the authors of CIRI2. The quantification and annotation were made with the RO1 and RO2 modules of CIRI-full as recommended using the author's pipeline.

Differential Expression of RNAs
To make the differential expression of linear RNAs and circRNAs, we used the R-base (v4.1), Rstudio IDE (v1.1.4), and the R libraries DEseq2 and ggplot2 to build the normalization of the expression counts, differential expression, statistical analysis, and plots.

Gene Ontology Analysis
GO analyses were carried out using the AmoebaDB data analysis suite. The gene ID of all detected circRNAs in both E. histolytica strains and E. invadens were uploaded and analyzed by the amoebaDB web (https://amoebadb.org) (accessed on 16 August 2022) to obtain the annotation, GO terms and GO enrichment analysis for each locus. Finally, with this data, we used the Revigo web tool (http://revigo.irb.hr) (accessed on 16 August 2022) to build the GO enrichment graph, and, with Rstudio IDE (v1.1.4), the graph was visualized.

Identification of circRNAs as miRNAs Sponges
To assess the prediction of circRNAs as sponges of miRNAs, we used the software intaRNA [53], which is capable of detecting RNA-RNA interactions through an alignment. The parameters used to make the predictions were a seed sequence of at least 7 nucleotides, canonical interactions between miRNA and circRNA in nucleotides 1 to 5, the possibility of gaps in the alignment, and favorable thermodynamics interaction ∆G < −7 Kcal/mol.

CIRI-Full Control
We designed in silico RNA decoy reads from exon end sequences of the locus EHI_110740 of E. histolytica. These reads were constructed by fragments of this locus to make 10 RNA library decoy reads with a length of 100 nucleotides which includes a BSJ. Additionally, we prepared the RNA libraries extracting 10% of all RNA reads, and we identified the