Grammar-Based Computational Framework for Predicting Pseudoknots of K-Type and M-Type in RNA Secondary Structures
Abstract
:1. Introduction
2. Theoretical Background
2.1. Pseudoknots in RNA
2.2. Pattern-Based Syntax Analysis
Context Free Grammars
3. Related Work
4. Overview of Our Approach
4.1. Grammar Definition for the Detection of K-Type Pseudoknots
4.2. Grammar Definition for the Detection of M-Type Pseudoknots
4.3. Core Stems Decoration
4.4. Optimal Pseudoknot Selection
- 1
- The Minimum Free Energy (MFE) method [45], which determines the RNA structure with the lowest free energy. Although this approach is based on the second law of thermodynamics, the predicted structure does not always correspond to natural conditions.
- 2
- The maximum pairing principle [46] emphasizes the counting of base pairs around the critical stems of the pseudoknot. In dot-bracket notation, the configurations with the highest number of base pairs around the pseudoknot generally correspond to the structures with minimum free energy.
- 3
- The partition function method [47] assumes that the true base pairs are those that are most likely to lie within the minimum free energy distribution and improves accuracy by including the free energy of neighboring pairs at a given temperature.
- 4
- Comparative Sequence Analysis [48] investigates substitution patterns in pairwise alignments of homologous sequences.
- 5
- Physical Experiments [49] includes laboratory techniques to validate predictions.
5. Conclusions and Future Work
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Marcia, M.; Humphris-Narayanan, E.; Keating, K.S.; Somarowthu, S.; Rajashankar, K.; Pyle, A.M. Solving nucleic acid structures by molecular replacement: Examples from group II intron studies. Acta Crystallogr. D Biol. Crystallogr. 2013, 69, 2174–2185. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Q.; Zhao, Z.; Fan, X.; Yuan, Z.; Mao, Q.; Yao, Y. Review of Machine Learning Methods for RNA Secondary Structure Prediction. PLoS Comput. Biol. 2021, 17, e1009291. [Google Scholar] [CrossRef] [PubMed]
- Andrikos, C.; Makris, E.; Kolaitis, A.; Rassias, G.; Pavlatos, C.; Tsanakas, P. Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition. Methods Protoc. 2022, 5, 14. [Google Scholar] [CrossRef] [PubMed]
- Makris, E.; Kolaitis, A.; Andrikos, C.; Moulos, V.; Tsanakas, P.; Pavlatos, C. Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops. Biomolecules 2023, 13, 308. [Google Scholar] [CrossRef]
- Koroulis, C.; Makris, E.; Kolaitis, A.; Tsanakas, P.; Pavlatos, C. Syntactic Pattern Recognition for the Prediction of L-Type Pseudoknots in RNA. Appl. Sci. 2023, 13, 5168. [Google Scholar] [CrossRef]
- Makris, E.; Kolaitis, A.; Andrikos, C.; Moulos, V.; Tsanakas, P.; Pavlatos, C. An intelligent grammar-based platform for RNA H-type pseudoknot prediction. In FIP International Conference on Artificial Intelligence Applications and Innovations; Springer: Cham, Switzerland, 2022; Volume 652. [Google Scholar]
- Watson, J.; Crick, F. Molecular Structure Of Nucleic Acids. Am. J. Psychiatry 2003, 160, 623–624. [Google Scholar] [CrossRef]
- Rietveld, K.; Van Poelgeest, R.; Pleij, C.W.; Van Boom, J.; Bosch, L. The tRNA-Uke structure at the 3′ terminus of turnip yellow mosaic virus RNA. Differences and similarities with canonical tRNA. Nucleic Acids Res. 1982, 10, 1929–1946. [Google Scholar] [CrossRef]
- Kucharík, M.; Hofacker, I.L.; Stadler, P.F.; Qin, J. Pseudoknots in RNA folding landscapes. Bioinformatics 2016, 32, 187–194. [Google Scholar] [CrossRef]
- Staple, D.W.; Butcher, S.E. Pseudoknots: RNA structures with diverse functions. PLoS Biol. 2005, 3, e213. [Google Scholar] [CrossRef]
- Hopcroft, J.E.; Ullman, J.D. Formal Languages and Their Relation to Automata; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1969. [Google Scholar]
- Chomsky, N. Three models for the description of language. IRE Trans. Inf. Theory 1956, 2, 113–124. [Google Scholar] [CrossRef]
- Pavlatos, C.; Vita, V.; Ekonomou, L. Syntactic pattern recognition of power system signals. In Proceedings of the 19th WSEAS International Conference on Systems (Part of CSCC’15), Zakynthos Island, Greece, 16–20 July 2015; pp. 16–20. [Google Scholar]
- Panagopoulos, I.; Pavlatos, C.; Papakonstantinou, G. An Embedded System for Artificial Intelligence Applications. Int. J. Comput. Intell. 2004, 1, 1155–1169. [Google Scholar]
- Pavlatos, C.; Panagopoulos, I.; Papakonstantinou, G. A programmable pipelined coprocessor for parsing applications. In Proceedings of the Workshop on Application Specific Processors (WASP) CODES, Stockholm, Sweden, 7 September 2004; Volume 294. [Google Scholar]
- Pavlatos, C.; Dimopoulos, A.; Papakonstantinou, G. An intelligent embedded system for control applications. In Proceedings of the Workshop on Modeling and Control of Complex Systems, Ayia Napa, Cyprus, 30 June–1 July 2005. [Google Scholar]
- Younger, D.H. Recognition and parsing of context-free languages in n3. Inf. Control 1967, 10, 189–208. [Google Scholar] [CrossRef]
- Earley, J. An efficient context-free parsing algorithm. Commun. ACM 1970, 13, 94–102. [Google Scholar] [CrossRef]
- Graham, S.L.; Harrison, M.A.; Ruzzo, W.L. An improved context-free recognizer. ACM Trans. Program. Lang. Syst. 1980, 2, 415–462. [Google Scholar] [CrossRef]
- Ruzzo, W.L. General Context-Free Language Recognition. Ph.D. Thesis, University of California, Berkeley, CA, USA, 1978. [Google Scholar]
- Geng, T.; Xu, F.; Mei, H.; Meng, W.; Chen, Z.; Lai, C. A practical GLR parser generator for software reverse engineering. JNW 2014, 9, 769–776. [Google Scholar] [CrossRef]
- Pavlatos, C.; Dimopoulos, A.C.; Koulouris, A.; Andronikos, T.; Panagopoulos, I.; Papakonstantinou, G. Efficient reconfigurable embedded parsers. Comput. Lang. Syst. Struct. 2009, 35, 196–215. [Google Scholar] [CrossRef]
- Chiang, Y.; Fu, K. Parallel parsing algorithms and VLSI implementations for syntactic pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 302–314. [Google Scholar] [CrossRef]
- Akutsu, T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discret. Appl. Math. 2000, 104, 45–62. [Google Scholar] [CrossRef]
- Jabbari, H.; Wark, I.; Montemagno, C.; Will, S. Knotty: Efficient and accurate prediction of complex RNA pseudoknot structures. Bioinformatics 2018, 34, 3849–3856. [Google Scholar] [CrossRef]
- Bellaousov, S.; Mathews, D.H. ProbKnot: Fast prediction of RNA secondary structure including pseudoknots. RNA 2010, 16, 1870–1880. [Google Scholar] [CrossRef]
- Sato, K.; Kato, Y.; Hamada, M.; Akutsu, T.; Asai, K. IPknot: Fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 2011, 27, 85–93. [Google Scholar] [CrossRef] [PubMed]
- Sato, K.; Kato, Y. Prediction of RNA secondary structure including pseudoknots for long sequences. Briefings Bioinform. 2021, 23, bbab395. [Google Scholar] [CrossRef]
- Singh, J.; Hanson, J.; Paliwal, K.; Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 2019, 10, 5407. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Liu, Y.; Zhong, X.; Liu, H.; Lu, C.; Li, C.; Zhang, H. DMfold: A novel method to predict RNA secondary structure with pseudoknots based on deep learning and improved base pair Maximization Principle. Front. Genet. 2019, 10, 143. [Google Scholar] [CrossRef] [PubMed]
- Kangkun, M.; Jun, W.; Yi, X. Prediction of RNA secondary structure with pseudoknots using coupled deep neural networks. Biophys. Rep. 2020, 6, 146–154. [Google Scholar]
- Wang, Y.; Liu, Y.; Wang, S.; Liu, Z.; Gao, Y.; Zhang, H.; Dong, L. ATTfold: RNA secondary structure prediction with pseudoknots based on attention mechanism. Front. Genet. 2020, 11, 1564. [Google Scholar] [CrossRef] [PubMed]
- Fu, L.; Cao, Y.; Wu, J.; Peng, Q.; Nie, Q.; Xie, X. UFold: Fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res. 2021, 50, e14. [Google Scholar] [CrossRef]
- Knudsen, B.; Hein, J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 1999, 15, 446–454. [Google Scholar] [CrossRef]
- Knudsen, B.; Hein, J. Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars. Nucleic Acids Res. 2003, 31, 3423–3428. [Google Scholar] [CrossRef]
- Sukosd, Z.; Knudsen, B.; Vaerum, M.; Kjems, J.; Andersen, E.S. Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars. BMC Bioinform. 2011, 12, 103. [Google Scholar] [CrossRef]
- Pedersen, J.S.; Meyer, I.M.; Forsberg, R.; Simmonds, P.; Hein, J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004, 32, 4925–4936. [Google Scholar] [CrossRef] [PubMed]
- Do, C.B.; Woods, D.A.; Batzoglou, S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 2006, 22, e90–e98. [Google Scholar] [CrossRef] [PubMed]
- Pedersen, J.S.; Bejerano, G.; Siepel, A.; Rosenbloom, K.; Lindblad-Toh, K.; Lander, E.S.; Kent, J.; Miller, W.; Haussler, D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2006, 2, e33. [Google Scholar] [CrossRef] [PubMed]
- Nawrocki, E.P.; Kolbe, D.L.; Eddy, S.R. Infernal 1.0: Inference of RNA alignments. Bioinformatics 2009, 25, 1335–1337. [Google Scholar] [CrossRef] [PubMed]
- Anderson, J.W.; Haas, P.A.; Mathieson, L.A.; Volynkin, V.; Lyngsø, R.; Tataru, P.; Hein, J. Oxfold: Kinetic folding of RNA using stochastic context-free grammars and evolutionary information. Bioinformatics 2013, 29, 704–710. [Google Scholar] [CrossRef]
- Bradley, R.K.; Pachter, L.; Holmes, I. Specific alignment of structured RNA: Stochastic grammars and sequence annealing. Bioinformatics 2008, 24, 2677–2683. [Google Scholar] [CrossRef] [PubMed]
- Isambert, H.; Siggia, E.D. Modeling RNA folding paths with pseudoknots: Application to hepatitis delta virus ribozyme. Proc. Natl. Acad. Sci. USA 2000, 97, 6515–6520. [Google Scholar] [CrossRef]
- YAEP (Yet Another Earley Parser) - C++ Interface. Available online: https://github.com/vnmakarov/yaep (accessed on 1 October 2024).
- Trotta, E. On the normalization of the minimum free energy of RNAs by sequence length. PLoS ONE 2014, 9, e113380. [Google Scholar] [CrossRef]
- Nussinov, R.; Jacobson, A.B. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc. Natl. Acad. Sci. USA 1980, 77, 6309–6313. [Google Scholar] [CrossRef]
- Mathews, D.H. Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 2004, 10, 1178–1190. [Google Scholar] [CrossRef] [PubMed]
- Rivas, E.; Eddy, S.R. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinform. 2001, 2, 8. [Google Scholar] [CrossRef] [PubMed]
- Chu, Y.; Corey, D.R. RNA Sequencing: Platform Selection, Experimental Design, and Data Interpretation. Nucleic Acid Ther. 2012, 22, 271–274. [Google Scholar] [CrossRef] [PubMed]
Platform | Methodology |
---|---|
Knotty | CCJ algorithm with sparsification |
ProbKnot | base pair probabilities with maximum expected accuracy |
IPknot | Integer programming |
Knotify | CFG with MFE and maximum base pairs |
2d RNA | bidirectional LSTM/FCN |
ATTfold | CNN/FCN |
UFold | SCFG |
CONTRAfold | SCFG |
Evfold | SCFG |
Infernal | SCFG |
Oxfold | SCFG |
Stemloc | SCFG |
Rule Number | Syntactic Rules |
---|---|
0 | R → “A” X “A” X “U” D “A” X “U” X “U” |
1 | R → “A” X “A” X “U” D “U” X “U” X “A” |
2 | R → “A” X “A” X “U” D “C” X “U” X “G” |
3 | R → “A” X “A” X “U” D “G” X “U” X “C” |
4 | R → “A” X “U” X “U” D “A” X “A” X “U” |
5 | R → “A” X “U” X “U” D “U” X “A” X “A” |
6 | R → “A” X “U” X “U” D “C” X “A” X “G” |
7 | R → “A” X “U” X “U” D “G” X “A” X “C” |
8 | R → “A” X “G” X “U” D “A” X “C” X “U” |
9 | R → “A” X “G” X “U” D “U” X “C” X “A” |
10 | R → “A” X “G” X “U” D “C” X “C” X “G” |
11 | R → “A” X “G” X “U” D “G” X “C” X “C” |
12 | R → “A” X “C” X “U” D “A” X “G” X “U” |
13 | R → “A” X “C” X “U” D “U” X “G” X “A” |
14 | R → “A” X “C” X “U” D “C” X “G” X “G” |
15 | R → “A” X “C” X “U” D “G” X “G” X “C” |
. | |
. | |
. | |
60 | R → “C” X “C” X “G” D “A” X “G” X “U” |
61 | R → “C” X “C” X “G” D “U” X “G” X “A” |
62 | R → “C” X “C” X “G” D “G” X “G” X “C” |
63 | R → “C” X “C” X “G” D “C” X “G” X “G” |
64 | X → “A” X |
65 | X → “U” X |
66 | X → “C” X |
67 | X → “G” X |
68 | X → “A” |
69 | X → “U” |
70 | X → “C” |
71 | X → “G” |
72 | D → Y Y |
73 | Y → “A” |
74 | Y → “U” |
75 | Y → “C” |
76 | Y → “G” |
77 | Y |
Rule Number | Syntactic Rules |
---|---|
0 | R → “A” X “A” X “A” X “U” D “A” X “U” X “U” X “U” |
1 | R → “A” X “A” X “A” X “U” D “U” X “U” X “U” X “A” |
2 | R → “A” X “A” X “A” X “U” D “G” X “U” X “U” X “C” |
3 | R → “A” X “A” X “A” X “U” D “C” X “U” X “U” X “G” |
4 | R → “A” X “A” X “U” X “U” D “A” X “U” X “A” X “U” |
5 | R → “A” X “A” X “U” X “U” D “U” X “U” X “A” X “A” |
6 | R → “A” X “A” X “U” X “U” D “G” X “U” X “A” X “C” |
7 | R → “A” X “A” X “U” X “U” D “C” X “U” X “A” X “G” |
8 | R → “A” X “A” X “G” X “U” D “A” X “U” X “C” X “U" |
9 | R → “A” X “A” X “G” X “U” D “U” X “U” X “C” X “A” |
10 | R → “A” X “A” X “G” X “U” D “G” X “U” X “C” X “C” |
11 | R → “A” X “A” X “G” X “U” D “C” X “U” X “C” X “G” |
12 | R → “A” X “A” X “C” X “U” D “A” X “U” X “G” X “U” |
13 | R → “A” X “A” X “C” X “U” D “U” X “U” X “G” X “A” |
14 | R → “A” X “A” X “C” X “U” D “G” X “U” X “G” X “C” |
15 | R → “A” X “A” X “C” X “U” D “C” X “U” X “G” X “G” |
. | |
. | |
. | |
252 | R → “C” X “C” X “C” X “G” D “A” X “G” X “G” X “U” |
253 | R → “A” X “A” X “A” X “U” D “U” X “U” X “U” X “A” |
254 | R → “A” X “A” X “A” X “U” D “G” X “U” X “U” X “C” |
255 | R → “A” X “A” X “A” X “U” D “C” X “U” X “U” X “G” |
256 | X → “A” X |
257 | X → “U” X |
258 | X → “C” X |
259 | X → “G” X |
260 | X → “A” |
261 | X → “U” |
262 | X → “C” |
263 | X → “G” |
264 | D → Y Y |
265 | Y → “A” |
266 | Y → “U” |
267 | Y → “C” |
268 | Y → “G” |
269 | Y |
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RNA | C | U | A | C | C | G | C | C | U | A | C | U | C | A | A | C | G | G | G | A | C | C |
Parser output: | . | . | ( | . | . | [ | . | . | ) | . | . | . | { | . | . | ] | . | . | } | . | . | . |
Phase 1 | . | ( | ( | . | . | [ | . | . | ) | ) | . | . | { | . | . | ] | . | . | } | . | . | . |
Phase 2 | . | ( | ( | . | . | [ | . | . | ) | ) | . | { | { | . | . | ] | . | . | } | } | . | . |
Phase 3 | . | ( | ( | [ | [ | [ | . | . | ) | ) | . | { | { | . | . | ] | ] | ] | } | } | . | . |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pavlatos, C. Grammar-Based Computational Framework for Predicting Pseudoknots of K-Type and M-Type in RNA Secondary Structures. Eng 2024, 5, 2531-2543. https://doi.org/10.3390/eng5040132
Pavlatos C. Grammar-Based Computational Framework for Predicting Pseudoknots of K-Type and M-Type in RNA Secondary Structures. Eng. 2024; 5(4):2531-2543. https://doi.org/10.3390/eng5040132
Chicago/Turabian StylePavlatos, Christos. 2024. "Grammar-Based Computational Framework for Predicting Pseudoknots of K-Type and M-Type in RNA Secondary Structures" Eng 5, no. 4: 2531-2543. https://doi.org/10.3390/eng5040132
APA StylePavlatos, C. (2024). Grammar-Based Computational Framework for Predicting Pseudoknots of K-Type and M-Type in RNA Secondary Structures. Eng, 5(4), 2531-2543. https://doi.org/10.3390/eng5040132