Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops
Abstract
:1. Introduction
2. Related Work
3. Theoretical Background
3.1. RNA
3.1.1. The Pseudoknot Motif
3.1.2. Bulges and Internal Loops
3.2. Syntactic Pattern Recognition
Context-Free Grammar
4. Proposed Methodology
4.1. CFG to Identify Pseudoknots
4.2. Decorate Core Stems
4.3. Optimal Tree Selection
5. Performance Evaluation
5.1. Dataset Construction
5.2. Methods of Evaluation
5.2.1. Pseudoknots’ Core Stems Prediction
5.2.2. Confusion Matrix, Precision, Recall, F1-Score, and MCC
5.2.3. Execution-Time Comparison
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
CFG | Context-free grammar |
CCJ | Chen–Condon–Jabbari |
CNN | Convolutional Neural Network |
CYK | Cocke–Younger–Kasami |
DNA | Deoxyribonucleic acid |
IBPMP | Improved base-pair maximization principle |
LSTM | Long short-term memory |
MCC | Matthews correlation coefficient |
MFE | Minimum free energy |
NP | Nondeterministic polynomial |
RNA | Ribonucleic acid |
SCFG | Stochastic context-free grammar |
YAEP | Yet another early parser |
References
- Crick, F. Central Dogma of Molecular Biology. Nature 1970, 227, 561–563. [Google Scholar] [CrossRef] [PubMed]
- Wu, L.; Belasco, J. Let Me Count the Ways: Mechanisms of Gene Regulation by miRNAs and siRNAs. Mol. Cell 2008, 29, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Rossi, J. Ribozyme diagnostics comes of age. Chem. Biol. 2004, 11, 894–895. [Google Scholar] [CrossRef]
- Shi, Y. A Glimpse of Structural Biology through X-ray Crystallography. Cell 2014, 159, 995–1014. [Google Scholar] [CrossRef] [PubMed]
- Barnwal, R.; Yang, F.; Varani, G. Applications of NMR to structure determination of RNAs large and small. Arch. Biochem. Biophys. 2017, 628, 42–56. [Google Scholar] [CrossRef]
- Zuker, M. Calculating nucleic acid secondary structure. Curr. Opin. Struct. Biol. 2000, 10, 303–310. [Google Scholar] [CrossRef]
- Nussinov, R.; Jacobson, A.B. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc. Natl. Acad. Sci. USA 1980, 77, 6309–6313. [Google Scholar] [CrossRef]
- Wang, L.; Liu, Y.; Zhong, X.; Liu, H.; Lu, C.; Li, C.; Zhang, H. DMfold: A novel method to predict RNA secondary structure with pseudoknots based on deep learning and improved base pair Maximization Principle. Front. Genet. 2019, 10, 143. [Google Scholar] [CrossRef]
- Staple, D.W.; Butcher, S.E. Pseudoknots: RNA structures with diverse functions. PLoS Biol. 2005, 3, e213. [Google Scholar] [CrossRef]
- Wyatt, J.; Puglisi, J.; Tinoco, I. RNA folding: Pseudoknots, loops and bulges. Bioessays 1989, 11, 100–106. [Google Scholar] [CrossRef]
- Turner, D. Bulges in nucleic acids. Curr. Opin. Struct. Biol. 1992, 2, 334–337. [Google Scholar] [CrossRef]
- Hermann, T.; Patel, D. RNA bulges as architectural and recognition motifs. Structure 2000, 8, R47–R54. [Google Scholar] [CrossRef]
- Wu, H.; Uhlenbeck, O. Role of a bulged A residue in a specific RNA-protein interaction. Biochemistry 1987, 26, 8221–8227. [Google Scholar] [CrossRef]
- Woese, C.; Gutell, R. Evidence for several higher order structural elements in ribosomal RNA. Proc. Natl. Acad. Sci. USA 1989, 86, 3119–3122. [Google Scholar] [CrossRef]
- Andrikos, C.; Makris, E.; Kolaitis, A.; Rassias, G.; Pavlatos, C.; Tsanakas, P. Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition. Methods Protoc. 2022, 5, 14. [Google Scholar] [CrossRef]
- Lorenz, R.; Bernhart, S.; Höner zu Siederdissen, C.; Tafer, H.; Flamm, C.; Stadler, P.; Hofacker, I. ViennaRNA package 2.0. Algorithms Mol. Biol. 2011, 6, 26. [Google Scholar] [CrossRef]
- Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003, 31, 3406–3415. [Google Scholar] [CrossRef]
- Cao, S.; Chen, S. Predicting structures and stabilities for H-type pseudoknots with interhelix loops. RNA 2009, 15, 696–706. Available online: https://pubmed.ncbi.nlm.nih.gov/19237463 (accessed on 31 January 2023). [CrossRef]
- Akutsu, T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discret. Appl. Math. 2000, 104, 45–62. [Google Scholar] [CrossRef]
- Meyer, I.M.; Miklos, I. SimulFold: Simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework. PLoS Comput. Biol. 2007, 3, e149. [Google Scholar] [CrossRef]
- Van Batenburg, F.; Gultyaev, A.P.; Pleij, C.W. An APL-programmed genetic algorithm for the prediction of RNA secondary structure. J. Theor. Biol. 1995, 174, 269–280. [Google Scholar] [CrossRef] [PubMed]
- Isambert, H.; Siggia, E.D. Modeling RNA folding paths with pseudoknots: Application to hepatitis delta virus ribozyme. Proc. Natl. Acad. Sci. USA 2000, 97, 6515–6520. [Google Scholar] [CrossRef] [PubMed]
- Jabbari, H.; Wark, I.; Montemagno, C.; Will, S. Knotty: Efficient and accurate prediction of complex RNA pseudoknot structures. Bioinformatics 2018, 34, 3849–3856. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.L.; Condon, A.; Jabbari, H. An O(n(5)) algorithm for MFE prediction of kissing hairpins and 4-chains in nucleic acids. J. Comput. Biol. 2009, 16, 803–815. [Google Scholar] [CrossRef]
- Bellaousov, S.; Mathews, D.H. ProbKnot: Fast prediction of RNA secondary structure including pseudoknots. RNA 2010, 16, 1870–1880. [Google Scholar] [CrossRef]
- Sato, K.; Kato, Y.; Hamada, M.; Akutsu, T.; Asai, K. IPknot: Fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 2011, 27, 85–93. [Google Scholar] [CrossRef]
- Sato, K.; Kato, Y. Prediction of RNA secondary structure including pseudoknots for long sequences. Brief. Bioinform. 2021, 23, 395. [Google Scholar] [CrossRef]
- Knudsen, B.; Hein, J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 1999, 15, 446–454. [Google Scholar] [CrossRef]
- Knudsen, B.; Hein, J. Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 2003, 31, 3423–3428. [Google Scholar] [CrossRef]
- Sukosd, Z.; Knudsen, B.; Vaerum, M.; Kjems, J.; Andersen, E.S. Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars. BMC Bioinform. 2011, 12, 103. [Google Scholar] [CrossRef]
- Pedersen, J.S.; Meyer, I.M.; Forsberg, R.; Simmonds, P.; Hein, J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004, 32, 4925–4936. [Google Scholar] [CrossRef]
- Do, C.B.; Woods, D.A.; Batzoglou, S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 2006, 22, e90–e98. [Google Scholar] [CrossRef]
- Pedersen, J.S.; Bejerano, G.; Siepel, A.; Rosenbloom, K.; Lindblad-Toh, K.; Lander, E.S.; Kent, J.; Miller, W.; Haussler, D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2006, 2, e33. [Google Scholar] [CrossRef]
- Nawrocki, E.P.; Kolbe, D.L.; Eddy, S.R. Infernal 1.0: Inference of RNA alignments. Bioinformatics 2009, 25, 1335–1337. [Google Scholar] [CrossRef]
- Anderson, J.W.; Haas, P.A.; Mathieson, L.A.; Volynkin, V.; Lyngsø, R.; Tataru, P.; Hein, J. Oxfold: Kinetic folding of RNA using stochastic context-free grammars and evolutionary information. Bioinformatics 2013, 29, 704–710. [Google Scholar] [CrossRef]
- Singh, J.; Hanson, J.; Paliwal, K.; Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 2019, 10, 1–13. [Google Scholar] [CrossRef]
- Kangkun, M.; Jun, W.; Yi, X. Prediction of RNA secondary structure with pseudoknots using coupled deep neural networks. Biophys. Rep. 2020, 6, 146–154. [Google Scholar]
- Wang, Y.; Liu, Y.; Wang, S.; Liu, Z.; Gao, Y.; Zhang, H.; Dong, L. ATTfold: RNA secondary structure prediction with pseudoknots based on attention mechanism. Front. Genet. 2020, 11, 1564. [Google Scholar] [CrossRef]
- Watson, J.; Crick, F. Molecular Structure Of Nucleic Acids. Am. J. Psychiatry 2003, 160, 623–624. [Google Scholar] [CrossRef]
- Rietveld, K.; Van Poelgeest, R.; Pleij, C.W.; Van Boom, J.; Bosch, L. The tRNA-Uke structure at the 3’ terminus of turnip yellow mosaic virus RNA. Differences and similarities with canonical tRNA. Nucleic Acids Res. 1982, 10, 1929–1946. [Google Scholar] [CrossRef]
- Kucharík, M.; Hofacker, I.L.; Stadler, P.F.; Qin, J. Pseudoknots in RNA folding landscapes. Bioinformatics 2016, 32, 187–194. [Google Scholar] [CrossRef] [PubMed]
- Makris, E.; Kolaitis, A.; Andrikos, C.; Moulos, V.; Tsanakas, P.; Pavlatos, C. An intelligent grammar-based platform for RNA H-type pseudoknot prediction. In Artificial Intelligence Applications and Innovations, Proceedings of the AIAI 2022 IFIP WG 12.5 International Workshops, IFIP Advances in Information and Communication Technology, Crete, Greece, 17–20 June 2022; Springer: Berlin/Heidelberg, Germany, 2022; Volume 652. [Google Scholar]
- Hopcroft, J.E.; Ullman, J.D. Formal Languages and Their Relation to Automata; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1969. [Google Scholar]
- Chomsky, N. Three models for the description of language. IRE Trans. Inf. Theory 1956, 2, 113–124. [Google Scholar] [CrossRef]
- Sipser, M. Introduction to the Theory of Computation; Thomson Course Technology: Boston, MA, USA, 2006; Volume 2. [Google Scholar]
- Aho, A.V.; Lam, M.S.; Sethi, R.; Ullman, J.D. Compilers: Principles, Techniques, and Tools, 2nd ed.; Addison Wesley: London, UK, 2006. [Google Scholar]
- Younger, D.H. Recognition and parsing of context-free languages in n3. Inf. Control. 1967, 10, 189–208. [Google Scholar] [CrossRef]
- Earley, J. An efficient context-free parsing algorithm. Commun. ACM 1970, 13, 94–102. [Google Scholar] [CrossRef]
- Graham, S.L.; Harrison, M.A.; Ruzzo, W.L. An improved context-free recognizer. ACM Trans. Program. Lang. Syst. 1980, 2, 415–462. [Google Scholar] [CrossRef]
- Ruzzo, W.L. General Context-Free Language Recognition. Ph.D. Thesis, University of California, Berkeley, CA, USA, 1978. [Google Scholar]
- Geng, T.; Xu, F.; Mei, H.; Meng, W.; Chen, Z.; Lai, C. A practical GLR parser generator for software reverse engineering. JNW 2014, 9, 769–776. [Google Scholar] [CrossRef]
- Pavlatos, C.; Dimopoulos, A.C.; Koulouris, A.; Andronikos, T.; Panagopoulos, I.; Papakonstantinou, G. Efficient reconfigurable embedded parsers. Comput. Lang. Syst. Struct. 2009, 35, 196–215. [Google Scholar] [CrossRef]
- Chiang, Y.; Fu, K. Parallel parsing algorithms and VLSI implementations for syntactic pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 302–314. [Google Scholar] [CrossRef]
- Available online: https://github.com/vnmakarov/yaep (accessed on 25 March 2020).
- Available online: https://github.com/ntua-dslab/Knotify/releases/tag/04-Knotify+ (accessed on 17 December 2022).
- Ren, J.; Rastegari, B.; Condon, A.; Hoos, H.H. HotKnots: ?Heuristic prediction of RNA secondary structures including pseudoknots. RNA 2005, 11, 1494–1504. [Google Scholar] [CrossRef]
- Mathews, D.; Sabina, J.; Zuker, M.; Turner, D. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure1. J. Mol. Biol. 1999, 288, 911–940. [Google Scholar] [CrossRef]
- Dirks, R.; Pierce, N. Introduction A Partition Function Algorithm for Nucleic Acid Secondary Structure Including Pseudoknots. J. Comput. Chem. 2003, 24, 1664–1677. [Google Scholar] [CrossRef] [Green Version]
- Available online: https://bit.ly/Knotify_plus_dataset_mdpi (accessed on 16 December 2022).
- Taufer, M.; Licon, A.; Araiza, R.; Mireles, D.; Van Batenburg, F.; Gultyaev, A.; Leung, M. PseudoBase++: An extension of PseudoBase for easy searching, formatting and visualization of pseudoknots. Nucleic Acids Res. 2009, 37, D127–D135. [Google Scholar] [CrossRef]
- Danaee, P.; Rouches, M.; Wiley, M.; Deng, D.; Huang, L.; Hendrix, D. bpRNA: Large-scale automated annotation and analysis of RNA secondary structure. Nucleic Acids Res. 2018, 46, 5381–5394. [Google Scholar] [CrossRef] [Green Version]
Position | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String | A | C | A | U | C | C | G | C | C | U | G | A | U | U | U | G | A | G | C | A | C | A |
Core stems: | . | . | . | . | [ | . | . | . | . | ( | ] | . | . | . | . | . | ) | . | . | . | . | . |
Stage 1 | . | . | . | . | [ | . | . | . | ( | ( | ] | . | . | . | . | . | ) | ) | . | . | . | . |
Stage 2 | . | . | . | . | [ | . | . | ( | ( | ( | ] | . | . | . | . | . | ) | ) | ) | . | . | . |
Stage 3 | . | . | . | [ | [ | . | . | ( | ( | ( | ] | ] | . | . | . | . | ) | ) | ) | . | . | . |
Stage 4 | . | . | [ | [ | [ | . | . | ( | ( | ( | ] | ] | ] | . | . | . | ) | ) | ) | . | . | . |
Stage 5 | [ | . | [ | [ | [ | . | ( | ( | ( | ( | ] | ] | ] | . | ] | . | ) | ) | ) | . | ) | . |
Platform | 2 Matches | 2 Matches (%) | 1 Match | At Least 1 Match (%) |
---|---|---|---|---|
IPknot | 38 | 14.62 | 22 | 18.85 |
Knotty | 121 | 46.54 | 47 | 55.58 |
Knotify | 142 | 54.62 | 38 | 61.92 |
Knotify+ | 142 | 54.62 | 45 | 63.27 |
Platform | tp | tn | fp | fn | Precision | Recall | F1-Score | MCC |
---|---|---|---|---|---|---|---|---|
IPknot | 3850 | 3746 | 1488 | 1606 | 0.721 | 0.706 | 0.713 | 0.421 |
Knotty | 5006 | 3331 | 1836 | 517 | 0.732 | 0.906 | 0.810 | 0.574 |
Knotify | 4170 | 4061 | 1154 | 1305 | 0.783 | 0.762 | 0.772 | 0.540 |
Knotify+ | 4342 | 3975 | 1306 | 1053 | 0.769 | 0.805 | 0.786 | 0.558 |
Length | L < 30 | 30 ≤ L < 40 | 40 ≤ L < 50 | L ≥ 50 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Platform | tp | tn | fp | fn | tp | tn | fp | fn | tp | tn | fp | fn | tp | tn | fp | fn | |
IPknot | 916 | 514 | 124 | 337 | 824 | 810 | 294 | 355 | 754 | 897 | 396 | 284 | 1368 | 1519 | 674 | 631 | |
Knotty | 1196 | 469 | 146 | 80 | 1064 | 786 | 316 | 117 | 894 | 803 | 510 | 124 | 1848 | 1264 | 876 | 204 | |
Knotify | 1230 | 490 | 132 | 39 | 748 | 991 | 288 | 304 | 748 | 991 | 288 | 304 | 1218 | 1723 | 420 | 831 | |
Knotify+ | 1248 | 486 | 132 | 25 | 1010 | 847 | 316 | 110 | 798 | 1004 | 328 | 242 | 1286 | 1638 | 530 | 676 |
Platform | Total Time (s) | Average Time (s) |
---|---|---|
IPknot | 117.02 | 0.45 |
Knotty | 582.91 | 2.24 |
Knotify | 56.43 | 0.22 |
Knotify+ | 74.05 | 0.28 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Makris, E.; Kolaitis, A.; Andrikos, C.; Moulos, V.; Tsanakas, P.; Pavlatos, C. Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops. Biomolecules 2023, 13, 308. https://doi.org/10.3390/biom13020308
Makris E, Kolaitis A, Andrikos C, Moulos V, Tsanakas P, Pavlatos C. Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops. Biomolecules. 2023; 13(2):308. https://doi.org/10.3390/biom13020308
Chicago/Turabian StyleMakris, Evangelos, Angelos Kolaitis, Christos Andrikos, Vrettos Moulos, Panayiotis Tsanakas, and Christos Pavlatos. 2023. "Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops" Biomolecules 13, no. 2: 308. https://doi.org/10.3390/biom13020308
APA StyleMakris, E., Kolaitis, A., Andrikos, C., Moulos, V., Tsanakas, P., & Pavlatos, C. (2023). Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops. Biomolecules, 13(2), 308. https://doi.org/10.3390/biom13020308