A Graph-Based Algorithm for Detecting Long Non-Coding RNAs Through RNA Secondary Structure Analysis
Abstract
1. Introduction
2. Rooted Tree Graphs
3. From RNA Sequences to Graphs
3.1. From Secondary Structures to Strings in Dot–Bracket Notation
3.2. The Associated Rooted Tree Graphs
- The 5′ and 3′ ends are considered the root vertex.
- A bulge or hairpin is considered a vertex when there are two or more consecutive unmatched nucleotides.
- A junction is considered a vertex.
- A stem is considered an edge if it has two or more complementary base pairs.
3.3. From DBN to SDBN
- i.
- If a stem consists only of one complementary base pair, it must be removed. In addition, the bulges or hairpins with no two or more consecutive unmatched nucleotides are removed.
- ii.
- All bulges (or unions) must have at least one dot separating each of the convergent stems; if this is not the case, insert one dot.
- iii.
- Finally, consecutive sequences of the same character will be reduced to one single character.
- (a)
- .((((((....(((((((((((..(((.(..((.....))..).)))..)))...)))))))).((((((((((.((((((..((....)))))))).))))))))))))))))..
- (b)
- .((((((....(((((((((((..(((....((.....))....)))..)))...)))))))).((((((((((.((((((..((....)))))))).))))))))))))))))..
- (c)
- .((((((....(((((((((((..(((....((.....))....)))..)))...)))))))).(((((((((((((((((..((....)))))))))))))))))))))))))..
- (d)
- .((((((....((((((((.(((..(((....((.....))....)))..)))...)))))))).(((((((((((((((((..((....)).))))))))))))))))).))))))..
- (e)
- .(.(.(.(.(.).).).).(.(.).).).
4. The Comparing Algorithm
The Algorithm
5. Results
5.1. Computations
5.2. Using Other RNA Folding Software Programs
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
▹Pseudocode: |
|
References
- Eddy, S.R. Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2001, 2, 919–929. [Google Scholar] [CrossRef]
- Waters, L.S.; Storz, G. Regulatory RNAs in bacteria. Cell 2009, 136, 615–628. [Google Scholar] [CrossRef]
- Fernandes, J.C.R.; Acuña, S.M.; Aoki, J.I.; Floeter-Winter, L.M.; Muxel, S.M. Long Non-Coding RNAs in the Regulation of Gene Expression: Physiology and Disease. Noncoding RNA 2019, 5, 17. [Google Scholar] [CrossRef] [PubMed]
- The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489, 57–74. [Google Scholar] [CrossRef] [PubMed]
- Zampetaki, A.; Albrecht, A.; Steinhofel, K. Long Non-coding RNA Structure and Function: Is There a Link? Front. Physiol. 2019, 10, 1127. [Google Scholar] [CrossRef]
- Liu, P.; Lusk, J.; Jonoska, N.; Vázquez, M. Tree polynomials identify a link between co-transcriptional R-loops and nascent RNA folding. PLoS Comput. Biol. 2024, 20, e1012669. [Google Scholar] [CrossRef] [PubMed]
- Gan, H.H.; Pasquali, S.; Schlick, T. Exploring the repertoire of RNA secondary motifs using graph theory: Implications for RNA design. Nucleic Acids Res. 2003, 31, 2926–2943. [Google Scholar] [CrossRef]
- Mamuye, A.L.; Rucco, M.; Tesei, L.; Merelli, E. Persistent Homology Analysis of RNA. Mol. Based Math. Biol. 2016, 4, 14–25. [Google Scholar] [CrossRef]
- Agrawal, D.K.; Tang, X.; Westbrook, A.; Marshall, R.; Maxwell, C.S.; Lucks, J.; Noireaux, V.; Beisel, C.L.; Dunlop, M.J.; Franco, E. Mathematical Modeling of RNA-Based Architectures for Closed Loop Control of Gene Expression. ACS Synth. Biol. 2018, 7, 1219–1228. [Google Scholar] [CrossRef]
- Siyu, H.; Yanchun, L.; Qin, M.; Yangyi, X.; Yu, Z.; Wei, D.; Cankun, W.; Ying, L. LncFinder: An integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information and physicochemical property. Brief. Bioinform. 2019, 20, 2009–2027. [Google Scholar] [CrossRef]
- Cao, L.; Wang, Y.; Bi, C.; Ye, Q.; Yin, T.; Ye, N. PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features. Genes 2020, 11, 981. [Google Scholar] [CrossRef]
- Yamashita, A.; Shichino, Y.; Yamamoto, M. The long non-coding RNA world in yeasts. Biochim. Biophys. Acta 2016, 1859, 147–154. [Google Scholar] [CrossRef] [PubMed]
- Diestel, R. Graph Theory; Graduate Texts in Mathematics, 5 (173); Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–17. [Google Scholar]
- Zadeh, J.N.; Steenberg, C.D.; Bois, J.S.; Wolfe, B.R.; Pierce, M.B.; Khan, A.R.; Dirks, R.M.; Pierce, N.A. NUPACK: Analysis and design of nucleic acid systems. J. Comput. Chem. 2011, 32, 170–173. [Google Scholar] [CrossRef] [PubMed]
- Gruber, A.R.; Lorenz, R.; Bernhart, S.H.; Neuböck, R.; Hofacker, I.L. The Vienna RNA Websuite. Nucl. Acids Res. 2008, 36, 70–74. [Google Scholar] [CrossRef]
- Willmott, D.; Murrugarra, D.; Ye, Q. Improving RNA secondary structure prediction via state inference with deep recurrent neural networks. Comput. Math. Biophys. 2020, 8, 36–50. [Google Scholar] [CrossRef]
- Cherry, J.M.; Hong, E.L.; Amundsen, C.; Balakrishnan, R.; Binkley, G.; Chan, E.T.; Christie, K.R.; Costanzo, M.C.; Dwight, S.S.; Engel, S.R.; et al. Saccharomyces Genome Database: The genomics resource of budding yeast. Nucleic Acids Res. 2012, 40, 36–50. [Google Scholar] [CrossRef]
- Xu, Z.; Wei, W.; Gagneur, J.; Perocchi, F.; Clauder-Münster, S.; Camblong, J.; Guffanti, E.; Stutz, F.; Huber, W.; Steinmetz, L.M. Bidirectional promoters generate pervasive transcription in yeast. Nature 2009, 457, 1033–1037. [Google Scholar] [CrossRef]
- van Dijk, E.; Chen, C.; d’Aubenton-Carafa, Y.; Gourvennec, S.; Kwapisz, M.; Roche, V.; Bertrand, C.; Silvain, M.; Legoix-Né, P.; Loeillet, S.; et al. XUTs are a class of Xrn1-sensitive antisense regulatory non-coding RNA in yeast. Nature 2011, 475, 1033–1037. [Google Scholar] [CrossRef]
- Geisler, S.; Lojek, L.; Khalil, A.M.; Baker, K.E.; Coller, J. Decapping of long noncoding RNAs regulates inducible genes. Mol. Cell 2012, 45, 1097–2767. [Google Scholar] [CrossRef]
- Castelnuovo, M.; Rahman, S.; Guffanti, E.; Infantino, V.; Stutz, F.; Zenklusen, D. Bimodal expression of PHO84 is modulated by early termination of antisense transcription. Nat. Struct. Mol. Biol. 2013, 20, 851–858. [Google Scholar] [CrossRef]
- Balarezo-Cisneros, L.N.; Parker, S.; Fraczek, M.G.; Timouma, S.; Wang, P.; O’Keefe, R.T.; Millar, C.B.; Delneri, D. Functional and transcriptional profiling of non-coding RNAs in yeast reveal context-dependent phenotypes and in trans effects on the protein regulatory network. PLoS Genet. 2021, 17, e1008761. [Google Scholar] [CrossRef]
- Novačić, A.; Vučenović, I.; Primig, M.; Stuparević, I. Noncoding RNAs as cell wall regulators in Saccharomyces cerevisiae. Crit. Rev. Microbiol. 2020, 46, 15–25. [Google Scholar] [CrossRef]
- Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucl. Acids Res. 2003, 31, 3406–3415. [Google Scholar] [CrossRef] [PubMed]
- Sato, K.; Akiyama, M.; Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 2021, 12, 941. [Google Scholar] [CrossRef] [PubMed]
- Singh, J.; Paliwal, K.; Zhang, T.; Shing, J.; Litfin, T.; Zhou, Y. Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning. Bioinformatics 2021, 37, 2589–2600. [Google Scholar] [CrossRef]
- Fu, L.; Cao, Y.; Wu, J.; Peng, Q.; Nie, Q.; Xie, X. UFold: Fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res. 2022, 50, e14. [Google Scholar] [CrossRef] [PubMed]
- Xu, N.; Chen, F.; Wang, F.; Lu, X.; Wang, X.; Lv, M.; Lu, C. Clinical significance of high expression of circulating serum lncRNA RP11-445H22.4 in breast cancer patients: A Chinese population-based study. Tumor Biol. 2015, 36, 7659–7665. [Google Scholar] [CrossRef]
- Hu, X.; Bao, J.; Wang, Z.; Zhang, Z.; Gu, P.; Tao, F.; Cui, D.; Jiang, W. The plasma lncRNA acting as fingerprint in nonsmall-cell lung cancer. Tumor Biol. 2016, 37, 3497–3504. [Google Scholar] [CrossRef]
Control Set | Control Set | Test Set | |||
---|---|---|---|---|---|
RNA | RNA | RNA | |||
ICR1 | 3199 | 15S Ribosomal | 1649 | KAP123 | 3342 |
RME2 | 2223 | LSR1 | 1175 | GRE2 | 1029 |
RME3 | 1905 | Telomerase | 1158 | ECM11 | 909 |
IRT1 | 1489 | SNR86 | 1004 | 1477 | 843 |
TLC1 | 1301 | SNR30 | 609 | 6754 | 840 |
PWR1 | 941 | Small nuclear SNR30 | 606 | 12189 | 583 |
RUF5-1 | 710 | SNR19 | 568 | ||
RUF21 | 707 | SNR84 | 550 | ||
ETS1-1 | 700 | Small nuclear SNR84 | 537 | ||
SRG1 | 551 | RPM1 | 483 | ||
RUF22 | 515 | SNR17B | 462 | ||
RUF20 | 443 | Nuclear RNASE P | 358 | ||
ITS1-1 | 361 | SNR42 | 351 | ||
RUF23 | 254 | NME1 | 340 | ||
ITS2-1 | 232 | U3 | 334 | ||
ETS2-1 | 211 | Small nuclear U3 | 333 | ||
RNA170 | 169 | RNASE MRP | 332 | ||
ZOD1 | 58 | SNR83 | 306 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cabrera-Ibarra, H.; Hernández-Granados, D.; Riego-Ruiz, L. A Graph-Based Algorithm for Detecting Long Non-Coding RNAs Through RNA Secondary Structure Analysis. Algorithms 2025, 18, 652. https://doi.org/10.3390/a18100652
Cabrera-Ibarra H, Hernández-Granados D, Riego-Ruiz L. A Graph-Based Algorithm for Detecting Long Non-Coding RNAs Through RNA Secondary Structure Analysis. Algorithms. 2025; 18(10):652. https://doi.org/10.3390/a18100652
Chicago/Turabian StyleCabrera-Ibarra, Hugo, David Hernández-Granados, and Lina Riego-Ruiz. 2025. "A Graph-Based Algorithm for Detecting Long Non-Coding RNAs Through RNA Secondary Structure Analysis" Algorithms 18, no. 10: 652. https://doi.org/10.3390/a18100652
APA StyleCabrera-Ibarra, H., Hernández-Granados, D., & Riego-Ruiz, L. (2025). A Graph-Based Algorithm for Detecting Long Non-Coding RNAs Through RNA Secondary Structure Analysis. Algorithms, 18(10), 652. https://doi.org/10.3390/a18100652