MicroRNAs (miRNAs) are ~22 nt non-coding transcripts playing a significant role in gene expression regulation [1
]. In the last few years, thousands of novel miRNAs have been discovered by bioinformatics analysis of small RNA (sRNA) deep sequencing (sRNA-seq) data [2
]. Moreover, analysis of sRNA-seq data brought to the discovery of microRNA-offset RNAs (moRNAs), a class of miRNA-sized RNAs that arise from pre-miRNA proximal regions [3
]. Furthermore, high-throughput RNA sequencing technology (RNA-seq), by the possibility of inspecting sRNA sequences with nucleotide resolution, enabled the identification of sRNA variants (isomiRs), which can diversify miRNA function through modulation of miRNA-target recognition [4
Since their discovery, moRNAs have been detected in different organisms [3
], including humans [11
]. MoRNAs are characterized by high sequence conservation [18
] and have been demonstrated to be functional RNAs [14
], potentially having an miRNA-like regulative effect [5
]. Moreover, it has been shown that moRNA expression can vary between different phenotypes in mammals [19
] and is deregulated in human disease [15
]. Although moRNA biogenesis has not yet been fully elucidated, it has been hypothesized that moRNAs may originate by alternative Drosha/DGCR8-mediated processing of the pre-miRNA hairpins [15
]. Besides, moRNA expression can be independent from, or compete with, the adjacent miRNA [14
To detect moRNAs, most authors had to devise custom bioinformatics procedures that analyzed the sRNA-seq data left uncharacterized by the existing software tools. To date, no tool that explicitly characterizes moRNAs, miRNAs, and isomiRs from sRNA-seq data is publicly and freely available to the scientific community [2
]. To fill this gap, we improved our method, which was successfully used to disclose and study miRNA-like RNAs [8
], to develop and publicly release miR&moRe2.
This manuscript describes the miR&moRe2 pipeline with a series of tests based on six datasets. First, miR&moRe2 results are compared with findings from three previous moRNA studies, to show that miR&moRe2 can detect previously described moRNAs. Then, miR&moRe2 predictions were validated considering data of small RNA expression upon miRNA biogenesis pathway knockdown. Finally, to illustrate miR&moRe2 usefulness and discovery power, miR&moRe2 was applied to a large sRNA-seq dataset in which moRNAs were previously not investigated.
The advent of high-throughput RNA sequencing coupled with advanced bioinformatics analysis provided molecular biology researchers with a technology of unprecedented discovery power [35
]. The recent identification of novel RNA molecules by means of innovative bioinformatics methods [36
] showed that thorough RNA-seq data inspection can be rewarding. MoRNAs, in particular, were identified and further examined by a few studies performing custom analysis combined with manual curation of specific datasets [2
]. Nevertheless, even though the first description of moRNAs dates back more than ten years [3
], the lack of bioinformatics tools explicitly considering moRNAs may have contributed to overlook these RNAs in many studies.
In this work, we presented miR&moRe2, a novel bioinformatics tool for detection and quantification of miRNAs, moRNAs, and their isoforms, from sRNA-seq data. The former implementation of miR&moRe [6
] was proven successful in applicative studies [8
] but was developed as an in-house method and considered only human data. Since then, the miR&moRe pipeline has been considerably improved by adding new features. Now, miR&moRe2 can be used for any species for which a reference genome has been assembled. Moreover, it includes prediction of miRNA precursors, allowing, in turn, the identification of moRNAs derived from still unannotated precursors. Furthermore, the code was deeply revised to support updated versions of the tools included in the pipeline, as well as to increase the ease of use and the computational efficiency through parallel computing.
Regarding miR&moRe2 pipeline design and implementation features, a series of filters on raw data are applied by means of efficient methods [40
], and optimal parameters for read alignment are employed [41
]. Moreover, miR&moRe2 makes use of the best performing and widely used methods for miRNA prediction, miRDeep2 [25
] and RNAfold [42
]. Altogether, these implementation strategies aimed at reducing false predictions derived from poor quality sequencing data.
We applied miR&moRe2 to public datasets from three previous studies (ASI, BUR, and MAV) reporting on moRNAs. In the first study (ASI) [14
] moRNAs were specifically reported along with their expression estimates. In contrast, the second work (BUR) [20
] detected moRNAs and reported them using conventional naming [43
]; whereas in MAV [29
] moRNAs were referred with a custom denomination. Straightforward comparisons were not possible since each work applied its own custom discovery and analysis pipeline, which, in addition, were based on currently outdated miRNA annotation. Moreover, the authors did not provide automated software to replicate their analysis, and, in one case [29
], they used custom naming to refer to moRNAs. These issues would not have arisen if the authors could have used an automated bioinformatics pipeline such miR&moRe2. We speculate that by tuning miR&moRe2 parameters, for instance regarding read pre-processing and alignment, our method could identify additional moRNAs to increase the match with previous works. However, given the lack of a gold standard dataset for moRNA validation, achieving a perfect match with other authors’ findings was not the aim of this study. Nevertheless, we observed significant overlap between the original works’ and miR&moRe2 results, which supported the reliability of our method’s findings. Moreover, novel moRNAs were detected from the analyzed data.
In accordance with previous reports [11
], moRNAs were more abundant in the nuclear fraction of cellular RNA. Further, moRNAs were generally less expressed than miRNAs, but specific moRNAs were abundant, even more than the flanking miRNA [11
], representing an alternative product respective to the mature miRNA from the same hairpin arm.
To further evaluate miR&moRe2 predictions, we analyzed the FRI dataset [30
] in which the miRNA biogenesis pathway was silenced at different stages, and we observed that moRNAs and new miRNAs identified were downregulated similarly to the known miRNAs. Beyond indirectly validating our method’s predictions, this supports previous hypothesis that moRNA and miRNA biogenesis are linked [11
To show that moRNA expression has been disregarded in previous studies, we applied miR&moRe2 on sRNA-seq data from a sizeable set of blood cell population samples from different healthy donors (JUZ dataset) [31
], providing the first large-scale comparative moRNA expression analysis. Tens of moRNAs and new miRNAs were detected, albeit with lower abundance than known miRNAs. However, consistent expression in cell populations of few moRNAs, such as moR-150-3p, moR-421-5p, and moR-103a-2-3/5p, suggests that they could be a constitutive part of the normal blood cell transcriptome. This last analysis was intended simply to illustrate the possibilities enabled by miR&moRe2, including the re-analysis of many datasets available in RNA-seq repositories. Nonetheless, our results set the basis for further investigation on the novel sRNAs predicted by miR&moRe2 in blood cells.
Unlike other sRNA-seq analysis tools [2
], miR&moRe2 allows a comprehensive characterization of the sRNAs generated by known and predicted miRNA precursors by detecting and quantifying the expression of both miRNAs and moRNAs with homogeneous criteria.
As performed in our earlier studies [15
], the sRNA characterization performed by miR&moRe2 allowed a comprehensive evaluation of miRNA and moRNA differential expression. Interestingly, moRNAs with significantly varied expression levels among cell populations were identified. MoR-150-3p, resulting highly expressed in lymphocytes, was previously validated, confirming its high expression in B-cells and plasma cells [17
]. MoR-103a-2-3p, with high expression in the present study, was previously found very abundant in JAK2 mutated cancer cells [13
], overexpressed in stem cells, and functionally linked to its flanking miRNA [14
]. These results underline that accounting moRNAs in sRNA comparative analysis can enrich the findings. Similar to miRNAs, moRNAs could have pleiotropic effects or act as fine tuners and they were hypothesised to cooperate with miRNAs to enhance miRNA function [14
]. For these reasons, researchers should not disregard moRNA expression.
MoRNAs were shown to be expressed in different species, from ascidian to mammals, and also from viral genomes [33
]. Our sample analyses illustrated that miR&moRe2 can be applied to human data also to obtain a metatranscriptomic profiling.
In conclusion, we demonstrated that miR&moRe2 is a valid bioinformatics tool to comprehensively analyze all the currently known sRNAs that can originate from each miRNA precursor gene. Using miR&moRe2 for sRNA analysis projects can contribute to increasing our knowledge of moRNAs and to the understanding of non-coding sRNA biogenesis and function.