Small RNAs (sRNAs) can act either transcriptionally by leading the epigenetic modifications at TE loci, or post-transcriptionally through targeted RNA degradation. sRNAs of the PIWI-interacting RNA (piRNA) class are the most potent silencers of TEs in germline cells [
24]. In somatic tissues, two additional classes of small RNAs contribute to TE silencing: short interfering RNAs (siRNAs) derived from expressed transposon transcripts [
81] and the more recently described 30 tRNA-derived fragments (30 tRFs) [
96]. Thus, it becomes evident that considering sRNAs and accurately quantifying their production is pivotal to the study of transposon biology. To this end, several packages have been released to investigate sRNA classes, which prove particularly challenging when derived from repetitive loci in the genome as they are short in length, typically between 18–36 nucleotides. Some methods have largely considered sRNA classes separately; however, several packages (e.g., unitas [
97] and TEsmall [
98]) consider sRNA classes comprehensively to facilitate proper normalization of heterogeneous sRNA libraries, and to facilitate differential expression analysis across classes while taking into consideration ambiguously mapped reads.
2.2.1. Tools for the Analysis of Multiple sRNA Types
In 2008, Moxon et al. presented a toolkit for analyzing large-scale plant small RNA datasets [
107], especially micro RNAs and trans-acting siRNAs (ta-siRNAs) that can both induce post-transcriptional silencing of target genes. The web-based tools can identify mature miRNAs and their precursors, compare sRNA expression profiles under varying conditions or between mutants and wild-type, and predict ta-siRNA. The successor of the toolkit is the UEA sRNA workbench [
108], a downloadable suite of tools that provides a user-friendly platform to create workflows for processing sRNA next-generation sequencing data. The workbench offers an enhanced version of the functions of its predecessor, as well as complemented with easily accessible complementary visualization tools.
MiRanalyzer [
109] is a web server tool that aimed to perform the analysis of the upcoming large amount of sRNAs deep-sequencing data. Using a list of unique miRNA reads and their expression levels, MiRanalyzer detects the known miRNAs, maps the remaining reads against transcribed sequences, and predicts new microRNAs. The tool is based on a random forest learning scheme that employs a selection of features (associated with nucleotide sequence, structure, and energy) based on their information gain. The prediction model was built on datasets from Human, rat, and
C. elegans and reaches AUC values of 97.9% and recall values of up to 75% on unseen data.
SeqBuster [
110] is another web-based tool that analyzes large-scale sRNA datasets, and the first to characterize isomiRs [
111], miRNA variants, that usually arise as a result of enzymatic 5′- or 3′-trimming, 3′ nucleotide addition or nucleotide substitution) [
112,
113,
114,
115]. The packages perform a variety of analyses, including identification of sRNAs, distribution of their length and frequency, and comparative expression levels of different sRNA loci between different samples. Application of SeqBuster to small-RNA datasets of human embryonic stem cells revealed that most miRNAs present different types of isomiRs, some of them being associated with stem cell differentiation. The authors also provide a stand-alone version, which allows for annotation against any custom database.
DARIO [
116] is a web service that allows the study of short read data from sRNA-seq experiments. Using mapped reads as an input, DARIO performs quality control, overlaps them with user-selected gene models, quantifies the RNA expression based on annotated ncRNAs from different ncRNA databases, and predicts new ncRNAs via a random forest classifier. DARIO supports the following assemblies as reference genomes: human (hg18 and hg19), mouse (mm9 and mm10), Rhesus monkey, Zebrafish,
C. elegans, and
D. melanogaster.
miRDeep2 [
117] is a user-friendly update and extension of miRDeep [
118], an algorithm that uses a probabilistic model of miRNA biogenesis to score the fit of sequenced RNAs to the biological model of miRNA biogenesis. The improved algorithm identifies canonical and non-canonical miRNAs such as those derived from TEs and informs on high-confidence candidates that are detected in at least two independent samples. miRDeep2 was tested on high-throughput sequencing data from seven animal species representing the major animal clades. In all clades tested, the algorithm identified miRNAs with high accuracy (98.6–99.9%) and sensitivity (71–90%), and it reported hundreds of novel miRNAs.
miRTools [
119] is a web service for the classification of sRNAs, annotation of known miRNAs based on NGS data, prediction of novel miRNAs, and identification of differentially expressed miRNAs. A few years later, its improved version, mirTools 2.0 [
120] offered a series of additional features: detection and profiling of more types of ncRNAs (such as tRNAs, snRNAs, snoRNAs, rRNAs, and piRNAs), identification of miRNA-targeted genes, detection of differentially expressed ncRNA, as well as a standalone version of the tools. However, the webserver of mirTools is currently inaccessible.
ShortStack [
121] is a stand-alone application that allows for the analysis of reference-aligned sRNA-seq data and de novo annotation and quantification of the inferred sRNA genes. It provides highly specific annotation of miRNA loci in all tested plant (
Arabidopsis, tomato, rice, and maize) and animal (
D. melanogaster, mouse, and human) species. ShortStack reports on parameters relevant to sRNAgene annotation, such as size distributions, repetitiveness, strandedness, hairpin-association, miRNA annotation, and phasing. ShortStack uses modest computational resources and has comparable performance with previously published tools (e.g., UEA sRNA workbench) upon testing on sRNA-seq data set from wild-type
Arabidopsis leaves.
sRNAtoolbox [
122] is a web-interfaced set of interconnected tools, including expression profiling from deep sequencing data via the sRNAbench tool [
123]), consensus differential expression, consensus target prediction, blast search against several remote databases, and visualization of sRNAs (differential) expression. All tools can be used independently or for the exploration and downstream analysis of sRNAbench results. An updated version of the sRNAtoolbox [
124] features additions such as new reference genomes from Ensembl, bacteria and virus collections from NCBI, and microRNA reference sequences from MirGeneDB, as well as parallel launching of several jobs (batch mode).
Chimira [
125] is a web-based system for fast analysis of miRNAs from small RNA-seq data and identification of epi-transcriptomic modifications (5′- and 3′-modifications, internal modifications, or variation), based on which it can identify global microRNA modification profiles. The input sequences are automatically cleaned, trimmed, size selected, and mapped directly to miRNA hairpin sequences. Chimira offers a set of tools for the interpretation and visualization of the results that facilitates the comparative analysis of the input samples. The results from benchmarking show that Chimira offers faster execution than Oasis [
126].
The accurate annotation and analysis of short non-coding RNAs (sncRNAs) often required the installation of multiple tools with possibly different technical limitations (e.g., dependencies, operating system). Gebert et al. [
97] developed unitas, a software that provides complete annotation in a manner suitable for non-expert users. By using a single tool, one can overcome the issue of the normalization of multiple mapping sequences. unitas supports the species with available ncRNA reference sequences in the Ensembl databases and provides standalone precompiled for Linux, Mac, and Windows systems.
Published in 2018, TEsmall [
98] is a novel software package that allows for simultaneous mapping, annotation, and relative quantification of a variety of sRNAs types including structural RNAs, miRNAs, siRNAs, and piRNAs on a common scale. Thus, it enables the study of the expression trends among different sRNA types and provides an insight into the cross-talk between sRNA regulatory pathways. Given the appropriate annotation, TEsmall can provide the same functions for any novel type of sRNA. TEsmall can shed light on the complex regulatory networks of different types of sRNAs that act cooperatively, especially in the area of transposon silencing. It is known that siRNAs and piRNAs repress TEs in somatic cells and the germline, respectively, but also piRNAs are found to act in conjunction with siRNAs to perform this role, whereas in plants miRNAs might serve as an intermediate to form siRNAs [
127,
128,
129].
Oasis 2 [
130] is a web application useful for detecting and classifying sRNAs, as well as for analyzing their differential expression. Oasis 2 is a faster and more accurate version of Oasis [
126] (accuracy of around 87% for Oasis 2 versus 80% for the original application) that also recognizes potential cross-species miRNAs and viral and bacterial sRNAs in infected samples and provides the option for interactively visualizing novel miRNAs and querying them against 14 supported genomes or the Oasis database of miRNAs and miRNA families.
sRNAPipe [
131] is a user-friendly pipeline that offers a range of analyses for small RNA-seq data. The pipeline performs successive steps of mapping small RNA-seq reads to chromosomes, TEs, gene transcripts, miRNAs, small nuclear RNAs, rRNAs, and tRNAs. It also provides individual mapping, counting, and normalization for chromosomes, TEs, and gene transcripts, and tests ping-pong amplification for putative piRNAs. sRNAPipe allows for the rapid and precise analysis of high-throughput data and it generates publication-quality figures and graphs. It is available in both the Galaxy Toolshed and via GitHub.
Also based on the Galaxy framework, RNA workbench 2.0 [
132] is a comprehensive set of analysis tools and consolidated workflows. It integrates an abundance (more than 100) tools useful in the field of RNA research, such as RNA alignment, annotation, target prediction, and RNA-RNA interaction.
GeneTEflow [
133] is a fully automated, reproducible, and platform-independent workflow that allows the comprehensive analysis of both genes and locus-specific TEs expression from RNA-Seq data employing different technologies (Nextflow [
134] and Docker [
135]). The pipeline can be extended to include additional types of analysis such as alternative splicing and fusion genes.
Manatee [
136] is an algorithm for the quantification of sRNA classes. In contrast to many available sRNA analysis pipelines, Manatee rescues highly multimapping and unaligned reads based on available annotation and robust density information and is capable of identifying and quantifying expression from isomiRs and unannotated loci that could give rise to yet unknown sRNAs. Performance comparison on real and simulated data shows that other state-of-the-art methods (among them ShortStack [
121] and sRNAbench [
124]) tend to overestimate transcripts with zero abundance in the simulated dataset and underestimate/assign zero reads to expressed and highly expressed transcripts. On the other hand, Manatee estimates counts that are the closest to the simulated abundances and achieves high accuracy across diverse sRNA classes. Moreover, Manatee can be easily implemented in pipelines, and its output is suitable for downstream analyses and functional studies.
Di Bella et al. [
137] published an elaborate comparative analysis of eight pipelines on RNA-seq data, including Oasis 2, sRNApipe, and sRNA workbench. Their systematic performance evaluation aims at establishing guidelines for the selection of the most appropriate workflow for each ncRNA class.
2.2.2. Tools for the Analysis of PIWI-Interacting RNAs
PIWI-interacting RNAs (or piRNAs) are animal-specific RNAs that comprise the largest and most heterogeneous class of the small ncRNA (sncRNA) family, with over 2 million distinct piRNA species in mouse [
138]. They function as guides for PIWI proteins, a subfamily of Argonaute proteins. Their length is 21–35 nucleotides and they are processed from long single-stranded precursor transcripts that originate from genomic loci known as piRNA clusters. piRNA clusters have been found to contain remnants of transposons in arthropods, whereas in birds and mammals they encode for long non-coding RNAs that are processed into piRNAs. In the majority of mammalian species, some RNAs are involved in the protection of the germline genome against transposon mobilization. piRNAs are mostly not conserved among species [
139]. PIWI-interacting RNAs (piRNAs) were first identified as novel silencing RNAs in the
Drosophila melanogaster testis two decades ago [
140].
Active retrotransposition is more frequent in germ cells due to the epigenetic reprogramming that primes them for totipotency [
141]. To maintain the integrity of the genome passed on to the next generation, the metazoan germline exhibits the so-called piRNA pathway, an additional retrotransposon control based on small RNA-mediated recognition and endonucleolytic cleavage of the target TE transcripts [
142,
143]. piRNAs are loaded into PIWI proteins and thus target the TE transcripts by sequence complementarity. The TE transcripts are subsequently cleaved, producing secondary piRNAs, which constitutes the so-called “ping-pong” cycle in fruit fly [
59,
144,
145,
146,
147]. The presence and functions of piRNAs in somatic cells are not as well characterized; however, it is known that some piRNAs are common for the germline and the soma, some appear exclusively in the soma, whereas others are exclusive for each tissue type [
148]. piRNAs were found to exhibit a bias for starting with a “U” in the 1st position and an “A” at the 10th position [
149].
The first bioinformatics tool for piRNA prediction was a
k-mer scheme (2011) [
150] which applied the Fisher discriminant algorithm to
k-mer (
k = 1 through 5) sequence features using small RNA data. The study trained the algorithm on datasets from non-piRNA and piRNA sequences of five model species (rat, mouse, human, fruit fly, and nematode), and it reports precision and sensitivity of over 90% and over 60%, respectively. The authors conclude that the method can be used to identify piRNAs of non-model organisms without complete genome sequences; however, the web server is currently out of order.
Pibomd (2014) [
151] is an SVM algorithm for piRNA identification based on motif discovery. Pibomd employed the computational biology tool Teiresias [
152] to identify motifs of variable length that appear frequently in mouse piRNA and non-piRNA sequences and developed an SVM classifier that uses those motifs as features. Training of an imbalanced SVM classifier (Asym-Pibomd) on the same training and testing datasets provided higher specificity but lower sensitivity than the balanced SVM classifier; however still higher sensitivity and accuracy than the
k-mer scheme [
150] on identifying mouse piRNAs. Analysis of the distribution of the motifs showed uniform distribution of motifs in the non-piRNA sequences but significant motif enrichment on the 5′- and/or 3′-end of the piRNA sequences. The web server allows users to upload multiple FASTA sequences and select the model for classification (balanced or imbalanced SVM classifier). The performance of the algorithm on datasets from five model species (rat, mouse, human, fruit fly, and nematode) is comparable to the
k-mer scheme [
150] (see
Table 1).
Genomic alignment of small RNA-seq data is a critical methodology for the study of small RNA. Butter (2014) [
153] is a Perl wrapper for samtools and for the short read aligner bowtie (ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences to large genomes) [
83,
154] to produce small RNA-seq alignments where multimapped small RNAs tend to be placed near regions of confidently high density.
PIANO (2014) [
155] is a program for piRNA annotation that uses piRNA-transposon interaction information predicted by RNAplex [
156]. piRNAs are aligned to transposons and a support vector machine (SVM) is applied on triplet elements that combine structure and sequence information extracted from piRNA-transposon matching/pairing duplexes. The SVM classifier can predict human, mouse, and rat piRNAs, with overall accuracy greater than 90%.
Luo et al. introduced a method for differentiating transposon-derived piRNAs (2016) [
157] based on six sequence-derived features. The datasets were derived from NONCODE version 3.0 for 3 species, namely human, mouse, and
Drosophila with a 1:1 ratio of positive to negative samples so that the results could be compared with those from previous studies [
155]. The study adopted two approaches: direct combination, which merges different feature vectors, and ensemble learning, which uses the weighted average scores of individual feature-based predictors; however, the weights are determined arbitrarily based on the AUC scores of the base predictors. The prediction models employ a random forest as the main classifier engine 10-fold cross. Validation of both methods on the human dataset achieved AUC and accuracy of at least 90% and 80% respectively on all three datasets. Both methods yield higher AUC scores upon comparison with the
k-mer method [
150] and PIANO [
155].
Another study by the same researchers developed a genetic algorithm-based weighted ensemble method named GA-WE (2016) [
158] for predicting transposon-derived piRNAs was trained on piRNA datasets from human, mouse, and
D. melanogaster, shown in
Table 2. The GA-WE models, in contrast with the previously described work [
157], determine automatically the optimal weights on the validation set. The method achieves AUC of at least 0.93 on both the balanced and unbalanced datasets by 10-fold cross-validation and it produces lower scores for cross-species experiments, indicating that piRNAs derived from different species may have different patterns. By adopting their previous work [
157], PIANO [
155], and the
k-mer scheme [
150] as benchmark methods, 10-CV on the human dataset showed that GA-WE achieves higher AUC scores on all datasets. Regarding cross-species prediction, models constructed on the mouse dataset perform better on the human dataset, possibly because of similar piRNA length distribution between the two mammals but different length distribution between mouse and
Drosophila.
piPipes (2015) [
159] is a set of five pipelines for the analysis of piRNA/transposon from different Next Generation Sequencing libraries (small RNA-seq, RNA-seq, Genome-seq, ChIP-seq, CAGE/Degradome-Seq). piPipes allows for the analysis of a single library and pair-wise comparison between two samples. It is implemented in Bash, C++, Python, Perl, and R and provides a standardized set of tools to analyze these diverse data types.
Liu et al. (2017) introduced a two-layer ensemble classifier, 2L-piRNA [
160], that first addressed the double question: can we predict a piRNA based solely on sequence information, and can we distinguish whether it is of the type that instructs DNA deadenylation? In its first layer, 2L-piRNA identifies whether a query RNA molecule is a piRNA or not, while in the second layer it identifies whether a piRNA has (or not) the function of instructing target mRNA deadenylation. The authors constructed a benchmark dataset consisting of 709 piRNA sequences that have the function of instructing target mRNA deadenylation, 709 piRNA sequences that do not have that function (extracted from piRBase), and 1418 non-piRNA sequences. The sequences were represented using pseudo-K-tuple nucleotide composition (PseKNC) [
161] for K = 2, and six helical parameters (rise, roll, shift, slide, tilt, and twist) for each possible RNA dinucleotide were taken into account. Comparison of the performance of the method with the “Accurate piRNA prediction” by Luo et al. [
157] and GA-WE [
158] showed that 2L-piRNA outperforms them on all metrics, as seen in
Table 3.
In 2014, Brayet et al. proposed piRPred [
162], an extensible and adaptive classification method for piRNA prediction that combines heterogeneous types of piRNA features and allows for the implementation of newly discovered piRNA characteristics. piRPred fuses support vector machines (SVMs) and multiple kernels that represent the following features: frequency of certain
k-mer motifs, the presence of a Uridine base as the first nucleotide of the sequence, the distance to centromeres and telomeres, and the occurrence of piRNAs in clusters in the genome. The algorithm was trained on human and
Drosophila piRNAs (positive datasets) and tRNAs, mature miRNAs, and exonic sequences (negative dataset). Comparison with the
k-mer method (see
Table 4) proposed by Zhang et al. [
150] shows that piPRed performs better, especially on human piRNA data.
In 2015, Menor et al. [
163] introduced a method for piRNA and mature miRNA classification based on the previously described Multiclass Relevance Units Machine classifier (McRUM) [
164], an empirical Bayesian kernel method. piRNA datasets were extracted from NONCODE 3.0 and the sequences were represented using
k-mers, for
k = 1 through 5. The authors made use of correlation-based feature selection (CFS) to select a subset of features on which to build classifier models and reduce the dimensionality. Comparison with the
k-mer scheme [
150], both the original and the one retrained on the datasets of the McRUM-based approach, reveals that the sensitivity of the latter is roughly 60% higher.
V-ELMpiRNApred (2017) [
165] is based on an ensemble classifier called voting-based extreme learning machine (V-ELM). It implements a hybrid feature vector of
k-mer features (
k = 1 through 5) and short sequence motifs (SSM), a series of new features with 80 dimensions that allow the study of the relation between discontinuous sites of sequences. Feature selection is then used to remove the
k-mer features with redundant information. V-ELMpiRNApred was trained on human piRNA and non-piRNA sequences from NONCODE 3.0 and its classification performance was compared with methods published earlier.
Table 5 shows that V-ELMpiRNApred outperforms previously published methods on all metrics.
In the same year, Boucheham et al. introduced IpiRId [
166], which allows for the representation of different types of features by combining several kernels that can be tested independently, thus enabling the study of feature conservation across species. The features include genomic and epigenomic information; apart from the sequence, IpiRId takes into consideration the positions on the chromatin, the positions regarding the sequence and/or structural motifs that can occur at the 5′ and/or the 3′ ends, possible occurrence in clusters, and interaction with specific target sequences. IpiRId, at its core, is based on the Multiple Kernel Learning (MKL), which allows for combining heterogeneous features by automatically tuning their weights in order to improve the prediction. Comparison of the performance with previously published techniques on datasets consisting of piRNAs from human, mouse, and
Drosophila shows that IpiRId outperforms the rest of the techniques scoring more than 90% accuracy in all species and similar values in all the other metrics (see
Table 6). To be noted is that Piano was originally trained on piRNA datasets from
Drosophila. Moreover, the study of the pertinence of the features best represented across species reveals that the most conserved piRNA features are Uridine and Adenine in the first and tenth position respectively, occurrence in clusters, and binding with transposons.
In 2018, Wang et al. introduced piRNN [
167], the first deep learning program for piRNA identification, which is based on convolutional neural networks (CNN) and adopts a genome-independent approach that does not need genome and/or epigenomic data for identifying piRNAs. piRNN constructs a feature vector from the input sequences in two parts: first, it extracts the
k-mer (
k = 1, 2, 3, 4, 5) motif frequencies and second it updates the feature vector with
k-mers motifs around the 1st and 10th base if the sequence starts with a T/U and/or has an A in the 10th position. Comparison of the performance with Piano, 2L-piRNA, and the
k-mer scheme on human data (as representative of mammalian piRNAs) and
D. melanogaster piRNA data shows that piRNN outperforms the other methods on all metrics used (see
Table 7).
The authors provide four models trained on piRNA data for four species (human, rat, C. elegans, and D. melanogaster); if desired, the users can retrain the models or train new ones. All in all, piRNN is a useful tool for piRNA prediction in non-model organisms with limited genomic resources.
piRNAPred [
168] (2019) is an integrated framework for piRNA prediction that employs hybrid features like
k-mer nucleotide composition (k-KNC, k = 1 to 5, which is a sort of k-mer strings “normalized” for the sequence length), secondary structure (paired or unpaired state), and thermodynamic and physicochemical properties of contiguous dinucleotides, extracted from piRNA sequences of eight species. Comparison of the performance of the best performing piRNAPred model with previously published methods reveals that the former exhibits the highest accuracy and a Matthew’s Correlation Coefficient (MCC) of 0.97 (
Table 8).
2.2.3. Tools for the Analysis of Circular RNAs
Thirty years ago, circular RNAs (circRNAs) were described as “abnormally spliced transcripts” formed by scrambled exons [
169], a phenomenon known as “exon shuffling” or “non-co-linear splicing”. circRNAs produced by co- and posttranscriptional head-to-tail “backsplicing”, where an exon’s 3′ splice site is ligated onto an upstream 5′ splice site of an exon on the same RNA molecule, as well as circRNAs generated from intronic lariats during colinear splicing, may exhibit physiologically relevant regulatory functions in eukaryotes [
170]. It has been demonstrated that circRNA production and canonical pre-mRNA splicing compete with each other and some splicing factors like
muscleblind might interact with flanking introns to promote exon circularization [
171]. CircRNAs are abundant in eukaryotic cells; measurements in human fibroblast cells revealed that there are over 25,000 circRNA isoforms per cell [
172]. Up to 23% of the actively transcribed human genes give rise to circRNAs whose expression is dynamically regulated between tissues, cell types, and during differentiation [
173]. Genome-wide studies revealed that half of the circRNAs do not contain intervening introns, whereas in hematopoietic progenitor cells introns are retained in 20% of the circRNAs [
174]. Pseudogenes can be retrotranscribed from circRNAs and they can also be inherited in mammalian genomes [
175,
176]. Bioinformatic analysis has shown that the intronic flanks adjacent to circularized exons are enriched in complementary ALU repeats [
172], with ALUs being the most abundant TEs [
177]. The alternative formation of inverted repeated Alu pairs and the competition between them can mediate alternative circularization, which leads to multiple circRNA transcripts [
178]. A recent study [
176] has indicated that circRNAs and TEs possibly co-evolve in a species-specific and dynamic manner. Their findings suggest a model according to which many circRNAs emerged convergently during evolution, as a byproduct of splicing in orthologs prone to insertion of TEs.
DeepCirCode [
179] utilizes a 2-layer convolutional neural network (CNN) to predict back-splicing for the formation of human circRNAs. The model takes as an input the binary vector (one-hot encoding) of the intron and exon sequences flanking the potential back-splicing sites and it predicts whether the two sites can be back-spliced. The kernels in the first layer detect the motif sites related to back-splicing, whereas the kernels in the second layer learn more complex motifs. The model was trained on human exonic circRNAs from the publicly accessible databases circRNADb [
180] and circBase [
181]. The performance of the model was compared with an SVM and an RF model [
182] (see
Table 9), previously constructed by the authors, that use
k-mer compositional features.
Relevant features learned by DeepCirCode are represented as sequence motifs, some of which match human known motifs involved in RNA splicing, transcription, or translation. Analysis of these motifs shows that their distribution in RNA sequences can be important for back-splicing. Moreover, some of the human motifs appear to be conserved in mouse and fruit fly.
Previously published bioinformatics and Machine Learning methods are suitable for animal circRNAs. Plants are rich in splicing signals and transposable elements, and the characteristics of their circRNAs are different from those in animals. Yin et al. [
183] recently presented PCirc, a method for extracting a variety of features (including open reading frames, numbers of
k-mers, and splicing junction sequence coding) from rice circRNAs and trained a machine learning model based on a random forest algorithm. The classification of PCirc was evaluated by accuracy, precision, and F1 score, all of which scored above 0.99 when using rice circRNAs and lncRNAs as positive and negative datasets respectively. Testing the model on other plant datasets yielded accuracy scores larger than 0.8.
A summary of the methods discussed above, including the year of publication and the web address in which the code/data (if any) are deposited, is provided in
Table 10.