Algorithms and Workflows in RNA Bioinformatics

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Technologies and Resources for Genetics".

Deadline for manuscript submissions: closed (15 January 2021) | Viewed by 28911

Special Issue Editors


E-Mail Website
Guest Editor
Computational Biology, Institute of Biochemical Engineering, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
Interests: bioinformatics; RNA structure; RNA-based regulation; metagenomics

E-Mail Website
Guest Editor
Bioinformatics, Institute of Computer Science, University of Leipzig, 04109 Leipzig, Germany
Interests: evolution; computational biology; structural biology; algorithmic bioinformatics; cheminformatics

Special Issue Information

Dear Colleagues,

RNA is central to the majority of cellular processes in all domains of life, and the number of noncoding RNAs is on par with those coding for proteins. Common to most RNAs is that their function is determined by structure. Computational methods to study RNA structure are therefore important tools for elucidating the function of RNAs. Furthermore, RNAs interact with other RNAs and also with proteins, forming complex regulatory networks. Assessing these computationally enables a holistic view on cellular regulation. In recent years, the field has expanded from a purely algorithmic to a data-integrative discipline, with diverse computational methods, e.g., dynamic programming, integer programming, and machine learning. This Special Issue in Genes on “Algorithms and Workflows in RNA Bioinformatics” addresses the methodological and algorithmic developments that help to efficiently analyze the huge amount of available data, to study the structure of RNA in great detail, and to elucidate RNA function and its integration into large cellular networks. Case studies are also welcome but should specifically address limitations/shortcomings of current computational approaches.

Prof. Björn Voß
Prof. Peter F. Stadler
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • RNA structure analysis
  • RNA–RNA and RNA–protein interactions
  • RNA evolution
  • RNA networks
  • Chemical modifications of RNA

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 2752 KiB  
Article
CircIMPACT: An R Package to Explore Circular RNA Impact on Gene Expression and Pathways
by Alessia Buratin, Enrico Gaffo, Anna Dal Molin and Stefania Bortoluzzi
Genes 2021, 12(7), 1044; https://doi.org/10.3390/genes12071044 - 6 Jul 2021
Cited by 3 | Viewed by 2852
Abstract
Circular RNAs (circRNAs) are transcripts generated by back-splicing. CircRNAs might regulate cellular processes by different mechanisms, including interaction with miRNAs and RNA-binding proteins. CircRNAs are pleiotropic molecules whose dysregulation has been linked to human diseases and can drive cancer by impacting gene expression [...] Read more.
Circular RNAs (circRNAs) are transcripts generated by back-splicing. CircRNAs might regulate cellular processes by different mechanisms, including interaction with miRNAs and RNA-binding proteins. CircRNAs are pleiotropic molecules whose dysregulation has been linked to human diseases and can drive cancer by impacting gene expression and signaling pathways. The detection of circRNAs aberrantly expressed in disease conditions calls for the investigation of their functions. Here, we propose CircIMPACT, a bioinformatics tool for the integrative analysis of circRNA and gene expression data to facilitate the identification and visualization of the genes whose expression varies according to circRNA expression changes. This tool can highlight regulatory axes potentially governed by circRNAs, which can be prioritized for further experimental study. The usefulness of CircIMPACT is exemplified by a case study analysis of bladder cancer RNA-seq data. The link between circHIPK3 and heparanase (HPSE) expression, due to the circHIPK3-miR558-HPSE regulatory axis previously determined by experimental studies on cell lines, was successfully detected. CircIMPACT is freely available at GitHub. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Graphical abstract

14 pages, 270 KiB  
Article
Improving RNA Branching Predictions: Advances and Limitations
by Svetlana Poznanović, Carson Wood, Michael Cloer and Christine Heitsch
Genes 2021, 12(4), 469; https://doi.org/10.3390/genes12040469 - 25 Mar 2021
Cited by 2 | Viewed by 1731
Abstract
Minimum free energy prediction of RNA secondary structures is based on the Nearest Neighbor Thermodynamics Model. While such predictions are typically good, the accuracy can vary widely even for short sequences, and the branching thermodynamics are an important factor in this variance. Recently, [...] Read more.
Minimum free energy prediction of RNA secondary structures is based on the Nearest Neighbor Thermodynamics Model. While such predictions are typically good, the accuracy can vary widely even for short sequences, and the branching thermodynamics are an important factor in this variance. Recently, the simplest model for multiloop energetics—a linear function of the number of branches and unpaired nucleotides—was found to be the best. Subsequently, a parametric analysis demonstrated that per family accuracy can be improved by changing the weightings in this linear function. However, the extent of improvement was not known due to the ad hoc method used to find the new parameters. Here we develop a branch-and-bound algorithm that finds the set of optimal parameters with the highest average accuracy for a given set of sequences. Our analysis shows that the previous ad hoc parameters are nearly optimal for tRNA and 5S rRNA sequences on both training and testing sets. Moreover, cross-family improvement is possible but more difficult because competing parameter regions favor different families. The results also indicate that restricting the unpaired nucleotide penalty to small values is warranted. This reduction makes analyzing longer sequences using the present techniques more feasible. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Figure 1

17 pages, 909 KiB  
Article
miRNAture—Computational Detection of microRNA Candidates
by Cristian A. Velandia-Huerto, Jörg Fallmann and Peter F. Stadler
Genes 2021, 12(3), 348; https://doi.org/10.3390/genes12030348 - 27 Feb 2021
Cited by 2 | Viewed by 2580
Abstract
Homology-based annotation of short RNAs, including microRNAs, is a difficult problem because their inherently small size limits the available information. Highly sensitive methods, including parameter optimized blast, nhmmer, or cmsearch runs designed to increase sensitivity inevitable lead to large numbers of false positives, [...] Read more.
Homology-based annotation of short RNAs, including microRNAs, is a difficult problem because their inherently small size limits the available information. Highly sensitive methods, including parameter optimized blast, nhmmer, or cmsearch runs designed to increase sensitivity inevitable lead to large numbers of false positives, which can be detected only by detailed analysis of specific features typical for a RNA family and/or the analysis of conservation patterns in structure-annotated multiple sequence alignments. The miRNAture pipeline implements a workflow specific to animal microRNAs that automatizes homology search and validation steps. The miRNAture pipeline yields very good results for a large number of “typical” miRBase families. However, it also highlights difficulties with atypical cases, in particular microRNAs deriving from repetitive elements and microRNAs with unusual, branched precursor structures and atypical locations of the mature product, which require specific curation by domain experts. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Figure 1

12 pages, 940 KiB  
Article
Simulation of Folding Kinetics for Aligned RNAs
by Jiabin Huang and Björn Voß
Genes 2021, 12(3), 347; https://doi.org/10.3390/genes12030347 - 26 Feb 2021
Viewed by 1521
Abstract
Studying the folding kinetics of an RNA can provide insight into its function and is thus a valuable method for RNA analyses. Computational approaches to the simulation of folding kinetics suffer from the exponentially large folding space that needs to be evaluated. Here, [...] Read more.
Studying the folding kinetics of an RNA can provide insight into its function and is thus a valuable method for RNA analyses. Computational approaches to the simulation of folding kinetics suffer from the exponentially large folding space that needs to be evaluated. Here, we present a new approach that combines structure abstraction with evolutionary conservation to restrict the analysis to common parts of folding spaces of related RNAs. The resulting algorithm can recapitulate the folding kinetics known for single RNAs and is able to analyse even long RNAs in reasonable time. Our program RNAliHiKinetics is the first algorithm for the simulation of consensus folding kinetics and addresses a long-standing problem in a new and unique way. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Figure 1

14 pages, 1313 KiB  
Article
Non-Redundant tRNA Reference Sequences for Deep Sequencing Analysis of tRNA Abundance and Epitranscriptomic RNA Modifications
by Florian PICHOT, Virginie MARCHAND, Mark HELM and Yuri MOTORIN
Genes 2021, 12(1), 81; https://doi.org/10.3390/genes12010081 - 10 Jan 2021
Cited by 7 | Viewed by 2787
Abstract
Analysis of RNA by deep-sequencing approaches has found widespread application in modern biology. In addition to measurements of RNA abundance under various physiological conditions, such techniques are now widely used for mapping and quantification of RNA modifications. Transfer RNA (tRNA) molecules are among [...] Read more.
Analysis of RNA by deep-sequencing approaches has found widespread application in modern biology. In addition to measurements of RNA abundance under various physiological conditions, such techniques are now widely used for mapping and quantification of RNA modifications. Transfer RNA (tRNA) molecules are among the frequent targets of such investigation, since they contain multiple modified residues. However, the major challenge in tRNA examination is related to a large number of duplicated and point-mutated genes encoding those RNA molecules. Moreover, the existence of multiple isoacceptors/isodecoders complicates both the analysis and read mapping. Existing databases for tRNA sequencing provide near exhaustive listings of tRNA genes, but the use of such highly redundant reference sequences in RNA-seq analyses leads to a large number of ambiguously mapped sequencing reads. Here we describe a relatively simple computational strategy for semi-automatic collapsing of highly redundant tRNA datasets into a non-redundant collection of reference tRNA sequences. The relevance of the approach was validated by analysis of experimentally obtained tRNA-sequencing datasets for different prokaryotic and eukaryotic model organisms. The data demonstrate that non-redundant tRNA reference sequences allow improving unambiguous mapping of deep sequencing data. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Figure 1

12 pages, 2228 KiB  
Article
DIANA-mAP: Analyzing miRNA from Raw NGS Data to Quantification
by Athanasios Alexiou, Dimitrios Zisis, Ioannis Kavakiotis, Marios Miliotis, Antonis Koussounadis, Dimitra Karagkouni and Artemis G. Hatzigeorgiou
Genes 2021, 12(1), 46; https://doi.org/10.3390/genes12010046 - 30 Dec 2020
Cited by 9 | Viewed by 4067
Abstract
microRNAs (miRNAs) are small non-coding RNAs (~22 nts) that are considered central post-transcriptional regulators of gene expression and key components in many pathological conditions. Next-Generation Sequencing (NGS) technologies have led to inexpensive, massive data production, revolutionizing every research aspect in the fields of [...] Read more.
microRNAs (miRNAs) are small non-coding RNAs (~22 nts) that are considered central post-transcriptional regulators of gene expression and key components in many pathological conditions. Next-Generation Sequencing (NGS) technologies have led to inexpensive, massive data production, revolutionizing every research aspect in the fields of biology and medicine. Particularly, small RNA-Seq (sRNA-Seq) enables small non-coding RNA quantification on a high-throughput scale, providing a closer look into the expression profiles of these crucial regulators within the cell. Here, we present DIANA-microRNA-Analysis-Pipeline (DIANA-mAP), a fully automated computational pipeline that allows the user to perform miRNA NGS data analysis from raw sRNA-Seq libraries to quantification and Differential Expression Analysis in an easy, scalable, efficient, and intuitive way. Emphasis has been given to data pre-processing, an early, critical step in the analysis for the robustness of the final results and conclusions. Through modularity, parallelizability and customization, DIANA-mAP produces high quality expression results, reports and graphs for downstream data mining and statistical analysis. In an extended evaluation, the tool outperforms similar tools providing pre-processing without any adapter knowledge. Closing, DIANA-mAP is a freely available tool. It is available dockerized with no dependency installations or standalone, accompanied by an installation manual through Github. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Figure 1

9 pages, 295 KiB  
Article
RNA Secondary Structures with Limited Base Pair Span: Exact Backtracking and an Application
by Ronny Lorenz and Peter F. Stadler
Genes 2021, 12(1), 14; https://doi.org/10.3390/genes12010014 - 24 Dec 2020
Cited by 4 | Viewed by 1841
Abstract
The accuracy of RNA secondary structure prediction decreases with the span of a base pair, i.e., the number of nucleotides that it encloses. The dynamic programming algorithms for RNA folding can be easily specialized in order to consider only base pairs with a [...] Read more.
The accuracy of RNA secondary structure prediction decreases with the span of a base pair, i.e., the number of nucleotides that it encloses. The dynamic programming algorithms for RNA folding can be easily specialized in order to consider only base pairs with a limited span L, reducing the memory requirements to O(nL), and further to O(n) by interleaving backtracking. However, the latter is an approximation that precludes the retrieval of the globally optimal structure. So far, the ViennaRNA package therefore does not provide a tool for computing optimal, span-restricted minimum energy structure. Here, we report on an efficient backtracking algorithm that reconstructs the globally optimal structure from the locally optimal fragments that are produced by the interleaved backtracking implemented in RNALfold. An implementation is integrated into the ViennaRNA package. The forward and the backtracking recursions of RNALfold are both easily constrained to structural components with a sufficiently negative z-scores. This provides a convenient method in order to identify hyper-stable structural elements. A screen of the C. elegans genome shows that such features are more abundant in real genomic sequences when compared to a di-nucleotide shuffled background model. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Figure 1

17 pages, 5995 KiB  
Article
RNAflow: An Effective and Simple RNA-Seq Differential Gene Expression Pipeline Using Nextflow
by Marie Lataretu and Martin Hölzer
Genes 2020, 11(12), 1487; https://doi.org/10.3390/genes11121487 - 10 Dec 2020
Cited by 13 | Viewed by 7757
Abstract
RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies [...] Read more.
RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies proved the irreproducibility of RNA-Seq studies. Here, we present a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus. We evaluated the DEG detection functionality while using qRT-PCR data serving as a reference and observed a very high correlation of the logarithmized gene expression fold changes. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Figure 1

11 pages, 1137 KiB  
Article
Direct Interactions with Nascent Transcripts Is Potentially a Common Targeting Mechanism of Long Non-Coding RNAs
by Ivan Antonov and Yulia Medvedeva
Genes 2020, 11(12), 1483; https://doi.org/10.3390/genes11121483 - 10 Dec 2020
Cited by 7 | Viewed by 2195
Abstract
Although thousands of mammalian long non-coding RNAs (lncRNAs) have been reported in the last decade, their functional annotation remains limited. A wet-lab approach to detect functions of a novel lncRNA usually includes its knockdown followed by RNA sequencing and identification of the deferentially [...] Read more.
Although thousands of mammalian long non-coding RNAs (lncRNAs) have been reported in the last decade, their functional annotation remains limited. A wet-lab approach to detect functions of a novel lncRNA usually includes its knockdown followed by RNA sequencing and identification of the deferentially expressed genes. However, identification of the molecular mechanism(s) used by the lncRNA to regulate its targets frequently becomes a challenge. Previously, we developed the ASSA algorithm that detects statistically significant inter-molecular RNA-RNA interactions. Here we designed a workflow that uses ASSA predictions to estimate the ability of an lncRNA to function via direct base pairing with the target transcripts (co- or post-transcriptionally). The workflow was applied to 300+ lncRNA knockdown experiments from the FANTOM6 pilot project producing statistically significant predictions for 71 unique lncRNAs (104 knockdowns). Surprisingly, the majority of these lncRNAs were likely to function co-transcriptionally, i.e., hybridize with the nascent transcripts of the target genes. Moreover, a number of the obtained predictions were supported by independent iMARGI experimental data on co-localization of lncRNA and chromatin. We detected an evolutionarily conserved lncRNA CHASERR (AC013394.2 or LINC01578) that could regulate target genes co-transcriptionally via interaction with a nascent transcript by directing CHD2 helicase. The obtained results suggested that this nuclear lncRNA may be able to activate expression of the target genes in trans by base-pairing with the nascent transcripts and directing the CHD2 helicase to the regulated promoters leading to open the chromatin and active transcription. Our study highlights the possible importance of base-pairing between nuclear lncRNAs and nascent transcripts for the regulation of gene expression. Full article
(This article belongs to the Special Issue Algorithms and Workflows in RNA Bioinformatics)
Show Figures

Figure 1

Back to TopTop