Automatic Generation of SBML Kinetic Models from Natural Language Texts Using GPT
Abstract
:1. Introduction
2. Results
2.1. GPT Alone Cannot Create Valid SBML Models
2.2. KinModGPT Can Create Valid SBML Models
2.3. Comparison with an Existing Tool
3. Discussion
4. Materials and Methods
4.1. GPT
4.2. Tellurium
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
GPT | Generative pre-trained transformer |
LLM | Large language model |
SBML | Systems Biology Markup Language |
COPASI | Complex Pathway Simulator |
CADLIVE | Computer-Aided Design of Living Systems |
INDRA | Integrated Network and Dynamical Reasoning Assembler |
Appendix A
References
- Kitano, H. Systems biology: A brief overview. Science 2002, 295, 1662–1664. [Google Scholar] [CrossRef] [Green Version]
- Hucka, M.; Finney, A.; Sauro, H.M.; Bolouri, H.; Doyle, J.C.; Kitano, H.; Arkin, A.P.; Bornstein, B.J.; Bray, D.; Cornish-Bowden, A.; et al. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19, 524–531. [Google Scholar] [CrossRef] [Green Version]
- Keating, S.M.; Waltemath, D.; Konig, M.; Zhang, F.; Drager, A.; Chaouiya, C.; Bergmann, F.T.; Finney, A.; Gillespie, C.S.; Helikar, T.; et al. SBML Level 3: An extensible format for the exchange and reuse of biological models. Mol. Syst. Biol. 2020, 16, e9110. [Google Scholar] [CrossRef]
- Choi, K.; Medley, J.K.; Konig, M.; Stocking, K.; Smith, L.; Gu, S.; Sauro, H.M. Tellurium: An extensible python-based modeling environment for systems and synthetic biology. Biosystems 2018, 171, 74–79. [Google Scholar] [CrossRef]
- Medley, J.K.; Choi, K.; Konig, M.; Smith, L.; Gu, S.; Hellerstein, J.; Sealfon, S.C.; Sauro, H.M. Tellurium notebooks—An environment for reproducible dynamical modeling in systems biology. PLoS Comput. Biol. 2018, 14, e1006220. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hoops, S.; Sahle, S.; Gauges, R.; Lee, C.; Pahle, J.; Simus, N.; Singhal, M.; Xu, L.; Mendes, P.; Kummer, U. COPASI—A COmplex PAthway SImulator. Bioinformatics 2006, 22, 3067–3074. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mendes, P.; Hoops, S.; Sahle, S.; Gauges, R.; Dada, J.; Kummer, U. Computational modeling of biochemical networks using COPASI. Methods Mol. Biol. 2009, 500, 17–59. [Google Scholar] [PubMed]
- Bergmann, F.T.; Hoops, S.; Klahn, B.; Kummer, U.; Mendes, P.; Pahle, J.; Sahle, S. COPASI and its applications in biotechnology. J. Biotechnol. 2017, 261, 215–220. [Google Scholar] [CrossRef]
- Kurata, H.; Matoba, N.; Shimizu, N. CADLIVE for constructing a large-scale biochemical network based on a simulation-directed notation and its application to yeast cell cycle. Nucleic Acids Res. 2003, 31, 4071–4084. [Google Scholar] [CrossRef] [Green Version]
- Kurata, H.; Masaki, K.; Sumida, Y.; Iwasaki, R. CADLIVE dynamic simulator: Direct link of biochemical networks to dynamic models. Genome Res. 2005, 15, 590–600. [Google Scholar] [CrossRef] [Green Version]
- Kurata, H.; Inoue, K.; Maeda, K.; Masaki, K.; Shimokawa, Y.; Zhao, Q. Extended CADLIVE: A novel graphical notation for design of biochemical network maps and computational pathway analysis. Nucleic Acids Res. 2007, 35, e134. [Google Scholar] [CrossRef] [PubMed]
- Gyori, B.M.; Bachman, J.A.; Subramanian, K.; Muhlich, J.L.; Galescu, L.; Sorger, P.K. From word models to executable models of signaling networks using automated assembly. Mol. Syst. Biol. 2017, 13, 954. [Google Scholar] [CrossRef]
- Todorov, P.V.; Gyori, B.M.; Bachman, J.A.; Sorger, P.K. INDRA-IPM: Interactive pathway modeling using natural language with automated assembly. Bioinformatics 2019, 35, 4501–4503. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Roose, K. The Brilliance and Weirdness of ChatGPT. New York Times, 26 December 2022. [Google Scholar]
- Terwiesch, C. Would Chat GPT Get a Wharton MBA? A Prediction Based on Its Performance in the Operations Management Course; Mack Institute for Innovation Management at the Wharton School, University of Pennsylvania: Philadelphia, PA, USA, 2023. [Google Scholar]
- Choi, J.H.; Hickman, K.E.; Monahan, A.; Schwarcz, D. ChatGPT Goes to Law School; SSRN: Rochester, NY, USA, 2023. [Google Scholar]
- Katz, D.M.; Bommarito, M.J.; Gao, S.; Arredondo, P. GPT-4 Passes the Bar Exam; SSRN: Rochester, NY, USA, 2023. [Google Scholar]
- Bussler, F. Will GPT-3 Kill Coding? 2020. Available online: https://towardsdatascience.com/will-gpt-3-kill-coding-630e4518c04d (accessed on 4 March 2023).
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Ji, X.; Xu, Y. libSRES: A C library for stochastic ranking evolution strategy for parameter estimation. Bioinformatics 2006, 22, 124–126. [Google Scholar] [CrossRef] [Green Version]
- Kuzmic, P. Program DYNAFIT for the analysis of enzyme kinetic data: Application to HIV proteinase. Anal. Biochem. 1996, 237, 260–273. [Google Scholar] [CrossRef]
- Mendes, P.; Kell, D. Non-linear optimization of biochemical pathways: Applications to metabolic engineering and parameter estimation. Bioinformatics 1998, 14, 869–883. [Google Scholar] [CrossRef] [Green Version]
- Kurata, H.; El-Samad, H.; Yi, T.-M.; Khammash, M.; Doyle, J. Feedback Regulation of the Heat Shock Response in E. coli. In Proceedings of the Conference on Decision and Control, Orlando, FL, USA, 4–7 December 2001; pp. 837–842. [Google Scholar]
- El-Samad, H.; Kurata, H.; Doyle, J.C.; Gross, C.A.; Khammash, M. Surviving heat shock: Control strategies for robustness and performance. Proc. Natl. Acad. Sci. USA 2005, 102, 2736–2741. [Google Scholar] [CrossRef] [Green Version]
- Kurata, H.; El-Samad, H.; Iwasaki, R.; Ohtake, H.; Doyle, J.C.; Grigorova, I.; Gross, C.A.; Khammash, M. Module-based analysis of robustness tradeoffs in the heat shock response system. PLoS Comput. Biol. 2006, 2, e59. [Google Scholar] [CrossRef] [Green Version]
- Bergmann, F.T.; Hucka, M.; Bornstein, B.J.; Jouraku, A. Online SBML Validator. Available online: https://synonym.caltech.edu/validator_servlet/ (accessed on 4 March 2023).
- Smith, L.P.; Bergmann, F.T.; Chandran, D.; Sauro, H.M. Antimony: A modular model definition language. Bioinformatics 2009, 25, 2452–2454. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jaqaman, K.; Danuser, G. Linking data to models: Data regression. Nat. Rev. Mol. Cell Biol. 2006, 7, 813–819. [Google Scholar] [CrossRef] [PubMed]
- Banga, J.R. Optimization in computational systems biology. BMC Syst. Biol. 2008, 2, 47. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ashyraliyev, M.; Fomekong-Nanfack, Y.; Kaandorp, J.A.; Blom, J.G. Systems biology: Parameter estimation for biochemical models. FEBS J. 2009, 276, 886–902. [Google Scholar] [CrossRef] [Green Version]
- Maeda, K.; Boogerd, F.C.; Kurata, H. libRCGA: A C library for real-coded genetic algorithms for rapid parameter estimation of kinetic models. IPSJ Trans. Bioinform. 2018, 11, 31–40. [Google Scholar] [CrossRef] [Green Version]
- Maeda, K.; Boogerd, F.C.; Kurata, H. RCGAToolbox: A Real-coded Genetic Algorithm Software for Parameter Estimation of Kinetic Models. IPSJ Trans. Bioinform. 2021, 14, 30–35. [Google Scholar] [CrossRef]
- Maeda, K.; Hatae, A.; Sakai, Y.; Boogerd, F.C.; Kurata, H. MLAGO: Machine learning-aided global optimization for Michaelis constant estimation of kinetic modeling. BMC Bioinform. 2022, 23, 455. [Google Scholar] [CrossRef]
- Galdzicki, M.; Clancy, K.P.; Oberortner, E.; Pocock, M.; Quinn, J.Y.; Rodriguez, C.A.; Roehner, N.; Wilson, M.L.; Adam, L.; Anderson, J.C.; et al. The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology. Nat. Biotechnol. 2014, 32, 545–550. [Google Scholar] [CrossRef] [Green Version]
- Bubeck, S.; Chandrasekaran, V.; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.; Lee, P.; Lee, Y.T.; Li, Y.; Lundberg, S. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv 2023, arXiv:2303.12712. [Google Scholar]
- Le Novere, N.; Hucka, M.; Mi, H.; Moodie, S.; Schreiber, F.; Sorokin, A.; Demir, E.; Wegner, K.; Aladjem, M.I.; Wimalaratne, S.M.; et al. The Systems Biology Graphical Notation. Nat. Biotechnol. 2009, 27, 735–741. [Google Scholar] [CrossRef]
- Eng, C.L.; Lawson, M.; Zhu, Q.; Dries, R.; Koulena, N.; Takei, Y.; Yun, J.; Cronin, C.; Karp, C.; Yuan, G.C.; et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 2019, 568, 235–239. [Google Scholar] [CrossRef] [PubMed]
- Song, Q.; Zhu, X.; Jin, L.; Chen, M.; Zhang, W.; Su, J. SMGR: A joint statistical method for integrative analysis of single-cell multi-omics data. NAR Genom. Bioinform. 2022, 4, lqac056. [Google Scholar] [CrossRef] [PubMed]
- Tang, Z.; Zhang, T.; Yang, B.; Su, J.; Song, Q. spaCI: Deciphering spatial cellular communications through adaptive graph model. Brief Bioinform. 2023, 24, bbac563. [Google Scholar] [CrossRef] [PubMed]
Method | Model Name | Are SBML Models Created? | Are the Created SBML Models Valid? | Are the Created SBML Models Consistent with Their Model Descriptions? |
---|---|---|---|---|
text-davinci-003 only | Decay | Yes | No | N/A |
HIV | Yes | No | N/A | |
Three-step | Yes | No | N/A | |
Heat shock response | Yes | No | N/A | |
gpt-3.5-turbo only | Decay | Yes | No | N/A |
HIV | Yes | No | N/A | |
Three-step | Yes | No | N/A | |
Heat shock response | Yes | No | N/A | |
gpt-4 only | Decay | Yes | No | N/A |
HIV | Yes | No | N/A | |
Three-step | Yes | No | N/A | |
Heat shock response | Yes | No | N/A | |
KinModGPT (text-davinci-003) | Decay | Yes | Yes | Yes |
HIV | Yes | Yes | Yes | |
Three-step | Yes | Yes | Yes | |
Heat shock response | Yes | Yes | Yes | |
KinModGPT (gpt-3.5-turbo) | Decay | Yes | Yes | Yes |
HIV | Yes | Yes | Yes | |
Three-step | Yes | Yes | No | |
Heat shock response | No | N/A | N/A | |
KinModGPT (gpt-4) | Decay | Yes | Yes | Yes |
HIV | Yes | Yes | Yes | |
Three-step | Yes | Yes | Yes | |
Heat shock response | Yes | Yes | Yes | |
INDRA | Decay | Yes | Yes | No |
HIV | Yes | Yes | No | |
Three-step | No | N/A | N/A | |
Heat shock response | No | N/A | N/A |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Maeda, K.; Kurata, H. Automatic Generation of SBML Kinetic Models from Natural Language Texts Using GPT. Int. J. Mol. Sci. 2023, 24, 7296. https://doi.org/10.3390/ijms24087296
Maeda K, Kurata H. Automatic Generation of SBML Kinetic Models from Natural Language Texts Using GPT. International Journal of Molecular Sciences. 2023; 24(8):7296. https://doi.org/10.3390/ijms24087296
Chicago/Turabian StyleMaeda, Kazuhiro, and Hiroyuki Kurata. 2023. "Automatic Generation of SBML Kinetic Models from Natural Language Texts Using GPT" International Journal of Molecular Sciences 24, no. 8: 7296. https://doi.org/10.3390/ijms24087296