Direct Inference of Base-Pairing Probabilities with Neural Networks Improves Prediction of RNA Secondary Structures with Pseudoknots
Abstract
:1. Introduction
2. Methods
2.1. Preliminaries
2.2. MEA-Based Scoring Function
2.3. Decoding Algorithms
2.3.1. Nussinov-Style Decoding Algorithm for Pseudoknot-Free Structures
2.3.2. IPknot-Style Decoding Algorithm for Pseudoknotted Structures
2.4. Inferring Base-Paring Probabilities
2.4.1. Traditional Models for Base-Pairing Probabilities
2.4.2. Neural Network Models
2.5. Learning Algorithm
Algorithm 1 The stochastic subgradient descent algorithm for structured support vector machines (SSVMs); is the predefined learning rate. |
|
3. Results
3.1. Implementation
3.2. Datasets
3.3. Prediction Performance
3.4. Effects of Context Length
3.5. Comparison with Previous Methods for Prediction of Pseudoknot-Free Secondary Structures
3.6. Comparison with Alternative Methods for Predicting Pseudoknotted Secondary Structures
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
BiRNN | bi-directional recurrent neural network |
FNN | feedforward neural network |
MEA | maximum expected accuracy |
MFE | minimum free energy |
ncRNA | non-coding RNA |
SSVM | structured support vector machine |
References
- Hirose, T.; Mishima, Y.; Tomari, Y. Elements and machinery of non-coding RNAs: Toward their taxonomy. EMBO Rep. 2014, 15, 489–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schroeder, S.J.; Turner, D.H. Optical melting measurements of nucleic acid thermodynamics. Meth. Enzymol. 2009, 468, 371–387. [Google Scholar]
- Turner, D.H.; Mathews, D.H. NNDB: The nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2010, 38, D280–D282. [Google Scholar] [CrossRef]
- Lorenz, R.; Bernhart, S.H.; Honer Zu Siederdissen, C.; Tafer, H.; Flamm, C.; Stadler, P.F.; Hofacker, I.L. ViennaRNA Package 2.0. Algorithms Mol. Biol. 2011, 6, 26. [Google Scholar] [CrossRef] [PubMed]
- Reuter, J.S.; Mathews, D.H. RNAstructure: Software for RNA secondary structure prediction and analysis. BMC BioInform. 2010, 11, 129. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zuker, M. On finding all suboptimal foldings of an RNA molecule. Science 1989, 244, 48–52. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Do, C.B.; Woods, D.A.; Batzoglou, S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 2006, 22, e90–e98. [Google Scholar] [CrossRef] [Green Version]
- Do, C.B.; Foo, C.S.; Ng, A. Efficient multiple hyperparameter learning for log-linear models. In Proceedings of the 20th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; Advances in Neural Information Processing Systems 20. Curran Associates Inc.: Red Hook, NY, USA, 2007. [Google Scholar]
- Andronescu, M.; Condon, A.; Hoos, H.H.; Mathews, D.H.; Murphy, K.P. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 2007, 23, 19–28. [Google Scholar] [CrossRef] [Green Version]
- Andronescu, M.; Condon, A.; Hoos, H.H.; Mathews, D.H.; Murphy, K.P. Computational approaches for RNA energy parameter estimation. RNA 2010, 16, 2304–2318. [Google Scholar] [CrossRef] [Green Version]
- Zakov, S.; Goldberg, Y.; Elhadad, M.; Ziv-Ukelson, M. Rich parameterization improves RNA structure prediction. J. Comput. Biol. 2011, 18, 1525–1542. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Akiyama, M.; Sato, K.; Sakakibara, Y. A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model. J. Bioinform. Comput. Biol. 2018, 16, 1840025. [Google Scholar] [CrossRef] [PubMed]
- Singh, J.; Hanson, J.; Paliwal, K.; Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 2019, 10, 5407. [Google Scholar] [CrossRef] [Green Version]
- Sato, K.; Akiyama, M.; Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 2021, 12, 941. [Google Scholar] [CrossRef]
- Fu, L.; Cao, Y.; Wu, J.; Peng, Q.; Nie, Q.; Xie, X. UFold: Fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res. 2022, 50, e14. [Google Scholar] [CrossRef] [PubMed]
- Carvalho, L.E.; Lawrence, C.E. Centroid estimation in discrete high-dimensional spaces with applications in biology. Proc. Natl. Acad. Sci. USA 2008, 105, 3209–3214. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- McCaskill, J.S. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 1990, 29, 1105–1119. [Google Scholar] [CrossRef]
- Hamada, M.; Kiryu, H.; Sato, K.; Mituyama, T.; Asai, K. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics 2009, 25, 465–473. [Google Scholar] [CrossRef] [Green Version]
- Sato, K.; Hamada, M.; Asai, K.; Mituyama, T. CENTROIDFOLD: A web server for RNA secondary structure prediction. Nucleic Acids Res. 2009, 37, W277–W280. [Google Scholar] [CrossRef] [Green Version]
- van Batenburg, F.H.; Gultyaev, A.P.; Pleij, C.W. PseudoBase: Structural information on RNA pseudoknots. Nucleic Acids Res. 2001, 29, 194–195. [Google Scholar] [CrossRef] [Green Version]
- Staple, D.W.; Butcher, S.E. Pseudoknots: RNA structures with diverse functions. PLoS Biol. 2005, 3, e213. [Google Scholar] [CrossRef] [Green Version]
- Brierley, I.; Pennell, S.; Gilbert, R.J. Viral RNA pseudoknots: Versatile motifs in gene expression and replication. Nat. Rev. Microbiol. 2007, 5, 598–610. [Google Scholar] [CrossRef] [PubMed]
- Fechter, P.; Rudinger-Thirion, J.; Florentz, C.; Giege, R. Novel features in the tRNA-like world of plant viral RNAs. Cell. Mol. Life Sci. 2001, 58, 1547–1561. [Google Scholar] [CrossRef] [PubMed]
- Akutsu, T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discret. Appl. Math. 2000, 104, 45–62. [Google Scholar] [CrossRef] [Green Version]
- Lyngsø, R.B.; Pedersen, C.N. RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 2000, 7, 409–427. [Google Scholar] [CrossRef]
- Rivas, E.; Eddy, S.R. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 1999, 285, 2053–2068. [Google Scholar] [CrossRef]
- Dirks, R.M.; Pierce, N.A. A partition function algorithm for nucleic acid secondary structure including pseudoknots. J. Comput. Chem. 2003, 24, 1664–1677. [Google Scholar] [CrossRef] [Green Version]
- Dirks, R.M.; Pierce, N.A. An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots. J. Comput. Chem. 2004, 25, 1295–1304. [Google Scholar] [CrossRef]
- Reeder, J.; Giegerich, R. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinform. 2004, 5, 104. [Google Scholar] [CrossRef] [Green Version]
- Jabbari, H.; Wark, I.; Montemagno, C.; Will, S. Knotty: Efficient and Accurate Prediction of Complex RNA Pseudoknot Structures. Bioinformatics 2018, 34, 3849–3856. [Google Scholar] [CrossRef]
- Ruan, J.; Stormo, G.D.; Zhang, W. An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics 2004, 20, 58–66. [Google Scholar] [CrossRef] [Green Version]
- Andronescu, M.S.; Pop, C.; Condon, A.E. Improved free energy parameters for RNA pseudoknotted secondary structure prediction. RNA 2010, 16, 26–42. [Google Scholar] [CrossRef] [PubMed]
- Ren, J.; Rastegari, B.; Condon, A.; Hoos, H.H. HotKnots: Heuristic prediction of RNA secondary structures including pseudoknots. RNA 2005, 11, 1494–1504. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, X.; He, S.M.; Bu, D.; Zhang, F.; Wang, Z.; Chen, R.; Gao, W. FlexStem: Improving predictions of RNA secondary structures with pseudoknots by reducing the search space. Bioinformatics 2008, 24, 1994–2001. [Google Scholar] [CrossRef] [Green Version]
- Bellaousov, S.; Mathews, D.H. ProbKnot: Fast prediction of RNA secondary structure including pseudoknots. RNA 2010, 16, 1870–1880. [Google Scholar] [CrossRef] [Green Version]
- Sato, K.; Kato, Y.; Hamada, M.; Akutsu, T.; Asai, K. IPknot: Fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 2011, 27, 85–93. [Google Scholar] [CrossRef] [Green Version]
- Sato, K.; Kato, Y. Prediction of RNA secondary structure including pseudoknots for long sequences. Brief. Bioinform. 2022, 23, bbab395. [Google Scholar] [CrossRef] [PubMed]
- Rivas, E. The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective. RNA Biol. 2013, 10, 1185–1196. [Google Scholar] [CrossRef] [Green Version]
- Cao, S.; Chen, S.J. Predicting RNA pseudoknot folding thermodynamics. Nucleic Acids Res. 2006, 34, 2634–2652. [Google Scholar] [CrossRef] [PubMed]
- Nussinov, R.; Pieczenick, G.; Griggs, J.; Kleitman, D. Algorithms for loop matching. SIAM J. Appl. Math. 1978, 35, 68–82. [Google Scholar] [CrossRef]
- Dowell, R.D.; Eddy, S.R. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinform. 2004, 5, 71. [Google Scholar] [CrossRef] [Green Version]
- Tsochantaridis, I.; Joachims, T.; Hofmann, T.; Altun, Y. Large Margin Methods for Structured and Interdependent Output Variables. J. Mach. Learn. Res. 2005, 6, 1453–1484. [Google Scholar]
- Tokui, S.; Oono, K.; Hido, S.; Clayton, J. Chainer: A Next-Generation Open Source Framework for Deep Learning. In Proceedings of the Workshop on Machine Learning Systems (LearningSys) in The Twenty-Ninth Annual Conference on Neural Information Processing Systems (NIPS), Montréal, QC, Canada, 11–12 December 2015. [Google Scholar]
- Mitchell, S.; Consulting, S.M.; O’sullivan, M.; Dunning, I. PuLP: A Linear Programming Toolkit for Python. 2011. Available online: https://optimization-online.org/2011/09/3178/ (accessed on 27 September 2022).
- Rivas, E.; Lang, R.; Eddy, S.R. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. RNA 2012, 18, 193–212. [Google Scholar] [CrossRef] [PubMed]
- Lu, Z.J.; Gloor, J.W.; Mathews, D.H. Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 2009, 15, 1805–1813. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gardner, P.P.; Daub, J.; Tate, J.; Moore, B.L.; Osuch, I.H.; Griffiths-Jones, S.; Finn, R.D.; Nawrocki, E.P.; Kolbe, D.L.; Eddy, S.R.; et al. Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res. 2011, 39, D141–D145. [Google Scholar] [CrossRef] [Green Version]
- Huang, X.; Ali, H. High sensitivity RNA pseudoknot prediction. Nucleic Acids Res. 2007, 35, 656–663. [Google Scholar] [CrossRef] [Green Version]
- Andronescu, M.; Bereg, V.; Hoos, H.H.; Condon, A. RNA STRAND: The RNA secondary structure and statistical analysis database. BMC Bioinform. 2008, 9, 340. [Google Scholar] [CrossRef] [Green Version]
- Zuker, M.; Stiegler, P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981, 9, 133–148. [Google Scholar] [CrossRef]
- Sato, K.; Mituyama, T.; Asai, K.; Sakakibara, Y. Directed acyclic graph kernels for structural RNA analysis. BMC Bioinform. 2008, 9, 318. [Google Scholar] [CrossRef] [Green Version]
- Morita, K.; Saito, Y.; Sato, K.; Oka, K.; Hotta, K.; Sakakibara, Y. Genome-wide searching with base-pairing kernel functions for noncoding RNAs: Computational and expression analysis of snoRNA families in Caenorhabditis elegans. Nucleic Acids Res. 2009, 37, 999–1009. [Google Scholar] [CrossRef] [Green Version]
- Kato, Y.; Sato, K.; Hamada, M.; Watanabe, Y.; Asai, K.; Akutsu, T. RactIP: Fast and accurate prediction of RNA-RNA interaction using integer programming. Bioinformatics 2010, 26, i460–i466. [Google Scholar] [CrossRef] [Green Version]
- Sato, K.; Kato, Y.; Akutsu, T.; Asai, K.; Sakakibara, Y. DAFS: Simultaneous aligning and folding of RNA sequences via dual decomposition. Bioinformatics 2012, 28, 3218–3224. [Google Scholar] [CrossRef] [PubMed]
Implementation | Model | SEN | PPV | F |
---|---|---|---|---|
Neuralfold | BiRNN | 0.649 | 0.601 | 0.624 |
Neuralfold | FNN | 0.600 | 0.700 | 0.646 |
CentroidFold | McCaskill | 0.513 | 0.544 | 0.528 |
Implementation | Model | SEN | PPV | F |
---|---|---|---|---|
Neuralfold | FNN | |||
IPknot | McCaskill w/o refine. | 0.619 | 0.710 | 0.661 |
IPknot | McCaskill w/refine. | 0.753 | 0.684 | 0.717 |
IPknot | Dirks–Pierce | 0.809 | 0.749 | 0.778 |
ID | PKB229 | PKB134 | ASE_00193 | CRW_00614 | CRW_00774 |
---|---|---|---|---|---|
Length (nt) | 67 | 137 | 301 | 494 | 989 |
Neuralfold (FNN) IPknot | 3.30 s | 27.78 s | 44.73 s | 60.22 s | 3 m 4.2 s |
(w/o refine.) | 0.01 s | 0.05 s | 0.18 s | 0.55 s | 2.64 s |
(w/refine.) | 0.03 s | 0.08 s | 0.31 s | 1.03 s | 5.86 s |
(D&P) | 8.36 s | 9 m 4.7 s | n/a | n/a | n/a |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Akiyama, M.; Sakakibara, Y.; Sato, K. Direct Inference of Base-Pairing Probabilities with Neural Networks Improves Prediction of RNA Secondary Structures with Pseudoknots. Genes 2022, 13, 2155. https://doi.org/10.3390/genes13112155
Akiyama M, Sakakibara Y, Sato K. Direct Inference of Base-Pairing Probabilities with Neural Networks Improves Prediction of RNA Secondary Structures with Pseudoknots. Genes. 2022; 13(11):2155. https://doi.org/10.3390/genes13112155
Chicago/Turabian StyleAkiyama, Manato, Yasubumi Sakakibara, and Kengo Sato. 2022. "Direct Inference of Base-Pairing Probabilities with Neural Networks Improves Prediction of RNA Secondary Structures with Pseudoknots" Genes 13, no. 11: 2155. https://doi.org/10.3390/genes13112155
APA StyleAkiyama, M., Sakakibara, Y., & Sato, K. (2022). Direct Inference of Base-Pairing Probabilities with Neural Networks Improves Prediction of RNA Secondary Structures with Pseudoknots. Genes, 13(11), 2155. https://doi.org/10.3390/genes13112155