Predictive Modelling in Clinical Bioinformatics: Key Concepts for Startups
Abstract
:1. Clinical Bioinformatics Role and its Dependency on Predictive Modelling
2. Key Concepts of Predictive Modelling in the Clinical Context
- Biological relevant and measurable or observational entities (dependent variables), which are the inputs of the model.
- Relational factors between variables with or without biological meaning (parameters), which can be estimated empirically or based on data fitting methodologies.
- Unknown clinical entities or properties of interest for prediction (dependent variables), which are the outputs of the model.
3. Examples of Clinical Applications of Predictive Modelling
4. Choosing the Correct Modelling Framework
5. Challenges of Clinical Bioinformatics: The Business Perspective
6. Conclusions and Perspectives
Funding
Data Availability Statement
Conflicts of Interest
References
- Denny, J.C.; Bastarache, L.; Roden, D.M. Phenome-Wide Association Studies as a Tool to Advance Precision Medicine. Annu. Rev. Genomics Hum. Genet. 2016, 17, 353–373. [Google Scholar] [CrossRef] [PubMed]
- Bilder, R.M.; Sabb, F.W.; Cannon, T.D.; London, E.D.; Jentsch, J.D.; Parker, D.S.; Poldrack, R.A.; Evans, C.; Freimer, N.B. Phenomics: The Systematic Study of Phenotypes on a Genome-Wide Scale. Neuroscience 2009, 164, 30–42. [Google Scholar] [CrossRef] [PubMed]
- Tsongalis, G.J.; Chao, E.; Hagenkord, J.M.; Hambuch, T.; Moore, J.H. Bioinformatics: What the Clinical Laboratorian Needs to Know and Prepare For. Clin. Chem. 2013, 59, 1301–1305. [Google Scholar] [CrossRef]
- Mack, S.C.; Northcott, P.A. Genomic Analysis of Childhood Brain Tumors: Methods for Genome-Wide Discovery and Precision Medicine Become Mainstream. J. Clin. Oncol. 2017, 35, 2346–2354. [Google Scholar] [CrossRef] [PubMed]
- Kholodenko, B.; Yaffe, M.B.; Kolch, W. Computational Approaches for Analyzing Information Flow in Biological Networks. Sci. Signal. 2012, 5, 1–14. [Google Scholar] [CrossRef] [PubMed]
- McDermott, U. Next-Generation Sequencing and Empowering Personalised Cancer Medicine. Drug Discov. Today 2015, 20, 1470–1475. [Google Scholar] [CrossRef] [PubMed]
- Pais, R.J. Bioinformatics and Predictive Modelling as Tools for Clinical Diagnostics. 2020, pp. 30–34. Available online: https://insights.omnia-health.com/laboratory/bioinformatics-and-predictive-modelling-tools-clinical-diagnostics (accessed on 1 August 2022).
- Mann, M.; Kumar, C.; Zeng, W.F.; Strauss, M.T. Artificial Intelligence for Proteomics and Biomarker Discovery. Cell Syst. 2021, 12, 759–770. [Google Scholar] [CrossRef]
- Khamis, M.M.; Adamko, D.J.; El-Aneed, A. Mass Spectrometric Based Approaches in Urine Metabolomics and Biomarker Discovery. Mass Spectrom. Rev. 2017, 36, 115–134. [Google Scholar] [CrossRef] [PubMed]
- Morris, J.S.; Baggerly, K.A.; Gutstein, H.B.; Coombes, K.R. Statistical Contributions to Proteomic Research. Methods Mol. Biol. 2010, 641, 143–166. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Y. Whole Genome and Exome Sequencing Reference Datasets from a Multi-Center and Cross-Platform Benchmark Study. Sci. Data 2021, 8, 296. [Google Scholar] [CrossRef]
- Pais, R.J.; Zmuidinaite, R.; Butler, S.A.; Iles, R.K. An Automated Workflow for MALDI-ToF Mass Spectra Pattern Identification on Large Data Sets: An Application to Detect Aneuploidies from Pregnancy Urine. Inform. Med. Unlocked 2019, 16, 100194. [Google Scholar] [CrossRef]
- Pais, R.J.; Iles, R.K.; Zmuidinaite, R. MALDI-ToF Mass Spectra Phenomic Analysis for Human Disease Diagnosis Enabled by Cutting-Edge Data Processing Pipelines and Bioinformatic Tools. Curr. Med. Chem. 2021, 28, 6532–6547. [Google Scholar] [CrossRef] [PubMed]
- Weisser, H.; Nahnsen, S.; Grossmann, J.; Nilse, L.; Quandt, A.; Brauer, H.; Sturm, M.; Kenar, E.; Kohlbacher, O.; Aebersold, R.; et al. An Automated Pipeline for High-Throughput Label-Free Quantitative Proteomics. J. Proteome Res. 2013, 12, 1628–1644. [Google Scholar] [CrossRef] [PubMed]
- Malm, E.K.; Srivastava, V.; Sundqvist, G.; Bulone, V. APP: An Automated Proteomics Pipeline for the Analysis of Mass Spectrometry Data Based on Multiple Open Access Tools. BMC Bioinform. 2014, 15, 441. [Google Scholar] [CrossRef]
- Hu, C.; Kumar, S.; Huang, J.; Ratnavelu, K. How to Better Satisfy Online Users? A Quantitative Study of Identity Reconstruction Based on Advanced Self-Discrepancy Theory. J. Data Sci. 2018, 15, 020081. [Google Scholar]
- Belmont, J.W.; Shaw, C.A. Clinical Bioinformatics: Emergence of a New Laboratory Discipline. Expert Rev. Mol. Diagn. 2016, 16, 1139–1141. [Google Scholar] [CrossRef] [PubMed]
- Simon, R. Genomic Biomarkers in Predictive Medicine: An Interim Analysis. EMBO Mol. Med. 2011, 3, 429–435. [Google Scholar] [CrossRef]
- Ao Kong, A.; Gupta, C.; Ferrari, M.; Agostini, M.; Bedin, C.; Bouamrani, A.; Tasciotti, E.; Azencott, R. Biomarker Signature Discovery from Mass Spectrometry Data. IEEE/ACM Trans. Comput. Biol. Bioinforma. 2014, 11, 766–772. [Google Scholar] [CrossRef]
- Chuang, H.-Y.; Hofree, M.; Ideker, T. A Decade of Systems Biology. Annu. Rev. Cell Dev. Biol. 2010, 26, 721–744. [Google Scholar] [CrossRef]
- De Ridder, D.; De Ridder, J.; Reinders, M.J.T. Pattern Recognition in Bioinformatics. Brief. Bioinform. 2013, 14, 633–647. [Google Scholar] [CrossRef]
- Pais, R.J.; Jardine, C.; Zmuidinaite, R.; Lacey, J.; Butler, S.; Iles, R. Rapid, Affordable and Efficient Screening of Multiple Blood Abnormalities Made Possible Using an Automated Tool for MALDI-ToF Spectrometry Analysis. Appl. Sci. 2019, 9, 4999. [Google Scholar] [CrossRef]
- Pais, R.J.; Zmuidinaite, R.; Lacey, J.C.; Jardine, C.S.; Iles, R.K. A Rapid and Affordable Screening Tool for Early-Stage Ovarian Cancer Detection Based on MALDI-ToF MS of Blood Serum. Appl. Sci. 2022, 12, 3030. [Google Scholar] [CrossRef]
- Ay, A.; Arnosti, D.N. Mathematical Modeling of Gene Expression: A Guide for the Perplexed Biologist. Crit. Rev. Biochem. Mol. Biol. 2011, 46, 137–151. [Google Scholar] [CrossRef]
- Fisher, J.; Henzinger, T. A Executable Cell Biology. Nat. Biotechnol. 2007, 25, 1239–1249. [Google Scholar] [CrossRef] [PubMed]
- Benson, N.; van der Graaf, P.H.; Peletier, L.A. Use of Mathematics to Guide Target Selection in Systems Pharmacology; Application to Receptor Tyrosine Kinase (RTK) Pathways. Eur. J. Pharm. Sci. 2017, 109, S140–S148. [Google Scholar] [CrossRef] [PubMed]
- Somvanshi, P.R.; Venkatesh, K.V. A Conceptual Review on Systems Biology in Health and Diseases: From Biological Networks to Modern Therapeutics. Syst. Synth. Biol. 2014, 8, 99–116. [Google Scholar] [CrossRef]
- Le Novère, N. Quantitative and Logic Modelling of Molecular and Gene Networks. Nat. Rev. Genet. 2015, 16, 146–158. [Google Scholar] [CrossRef]
- Dankers, F.J.W.M.; Traverso, A.; Wee, L.; van Kuijk, S.M.J. Prediction Modeling Methodology. In Fundamentals of Clinical Data Science; Springer International Publishing: Cham, Switzerland, 2019; pp. 101–120. [Google Scholar]
- Qian, G.; Mahdi, A. Sensitivity Analysis Methods in the Biomedical Sciences. Math. Biosci. 2020, 323, 108306. [Google Scholar] [CrossRef]
- Swan, A.L.; Mobasheri, A.; Allaway, D.; Liddell, S.; Bacardit, J. Application of Machine Learning to Proteomics Data: Classification and Biomarker Identification in Postgenomics Biology. Omi. A J. Integr. Biol. 2013, 17, 595–610. [Google Scholar] [CrossRef]
- Edwards, N.J.; Oberti, M.; Thangudu, R.R.; Cai, S.; McGarvey, P.B.; Jacob, S.; Madhavan, S.; Ketchum, K.A. The CPTAC Data Portal: A Resource for Cancer Proteomics Research. J. Proteome Res. 2015, 14, 2707–2713. [Google Scholar] [CrossRef] [PubMed]
- Pais, R.J. Simulation of Multiple Microenvironments Shows a Pivot Role of RPTPs on the Control of Epithelial-to-Mesenchymal Transition. Biosystems 2020, 198, 104268. [Google Scholar] [CrossRef] [PubMed]
- Lebedeva, G.; Sorokin, A.; Faratian, D.; Mullen, P.; Goltsov, A.; Langdon, S.P.; Harrison, D.J.; Goryanin, I. Model-Based Global Sensitivity Analysis as Applied to Identification of Anti-Cancer Drug Targets and Biomarkers of Drug Resistance in the ErbB2/3 Network. Eur. J. Pharm. Sci. 2012, 46, 244–258. [Google Scholar] [CrossRef] [PubMed]
- Flobak, Å.; Baudot, A.; Remy, E.; Thommesen, L.; Thieffry, D.; Kuiper, M.; Lægreid, A. Discovery of Drug Synergies in Gastric Cancer Cells Predicted by Logical Modeling. PLoS Comput. Biol. 2015, 11, e1004426. [Google Scholar] [CrossRef] [PubMed]
- Wynn, M.L.; Consul, N.; Merajver, S.D.; Schnell, S. Logic-Based Models in Systems Biology: A Predictive and Parameter-Free Network Analysis Method. Integr. Biol. 2012, 4, 1323–1337. [Google Scholar] [CrossRef]
- Calzone, L.; Tournier, L.; Fourquet, S.; Thieffry, D.; Zhivotovsky, B.; Barillot, E.; Zinovyev, A. Mathematical Modelling of Cell-Fate Decision in Response to Death Receptor Engagement. PLoS Comput. Biol. 2010, 6, e1000702. [Google Scholar] [CrossRef]
- Anderson, A.R.A.; Weaver, A.M.; Cummings, P.T.; Quaranta, V. Tumor Morphology and Phenotypic Evolution Driven by Selective Pressure from the Microenvironment. Cell 2006, 127, 905–915. [Google Scholar] [CrossRef]
- Pais, R.J.; Taveira, N. Predicting the Evolution and Control of the COVID-19 Pandemic in Portugal. F1000Research 2020, 9, 283. [Google Scholar] [CrossRef]
- IHME COVID-19 Health Service Utilization Forecasting Team; Murray, C.J.L. Forecasting COVID-19 Impact on Hospital Bed-Days, ICU-Days, Ventilator-Days and Deaths by US State in the next 4 Months. medRxiv 2020. [Google Scholar] [CrossRef]
- Kucharski, A.J.; Russell, T.W.; Diamond, C.; Liu, Y.; Edmunds, J.; Funk, S.; Eggo, R.M.; Sun, F.; Jit, M.; Munday, J.D.; et al. Early Dynamics of Transmission and Control of COVID-19: A Mathematical Modelling Study. Lancet Infect. Dis. 2020, 3099, 1–7. [Google Scholar] [CrossRef]
- Chen, T.M.; Rui, J.; Wang, Q.P.; Zhao, Z.Y.; Cui, J.A.; Yin, L. A Mathematical Model for Simulating the Phase-Based Transmissibility of a Novel Coronavirus. Infect. Dis. Poverty 2020, 9, 1–8. [Google Scholar] [CrossRef]
- Henderson, J.T.; Webber, E.M.; Sawaya, G.F. Screening for Ovarian Cancer. JAMA 2018, 319, 595. [Google Scholar] [CrossRef] [PubMed]
- Jacobs, I.J.; Menon, U.; Ryan, A.; Gentry-Maharaj, A.; Burnell, M.; Kalsi, J.K.; Amso, N.N.; Apostolidou, S.; Benjamin, E.; Cruickshank, D.; et al. Ovarian Cancer Screening and Mortality in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS): A Randomised Controlled Trial. Lancet 2016, 387, 945–956. [Google Scholar] [CrossRef]
- Whitwell, H.J.; Worthington, J.; Blyuss, O.; Gentry-Maharaj, A.; Ryan, A.; Gunu, R.; Kalsi, J.; Menon, U.; Jacobs, I.; Zaikin, A.; et al. Improved Early Detection of Ovarian Cancer Using Longitudinal Multimarker Models. Br. J. Cancer 2020, 122, 847–856. [Google Scholar] [CrossRef] [PubMed]
- Rosenwaks, Z.; Handyside, A.H.; Fiorentino, F.; Gleicher, N.; Paulson, R.J.; Schattman, G.L.; Scott, R.T.; Summers, M.C.; Treff, N.R.; Xu, K. The Pros and Cons of Preimplantation Genetic Testing for Aneuploidy: Clinical and Laboratory Perspectives. Fertil. Steril. 2018, 110, 353–361. [Google Scholar] [CrossRef]
- Cimadomo, D.; Capalbo, A.; Ubaldi, F.M.; Scarica, C.; Palagiano, A.; Canipari, R.; Rienzi, L. The Impact of Biopsy on Human Embryo Developmental Potential during Preimplantation Genetic Diagnosis. Biomed Res. Int. 2016, 2016, 7193075. [Google Scholar] [CrossRef]
- Pais, R.J.; Sharara, F.; Zmuidinaite, R.; Butler, S.; Keshavarz, S.; Iles, R. Bioinformatic Identification of Euploid and Aneuploid Embryo Secretome Signatures in IVF Culture Media Based on MALDI-ToF Mass Spectrometry. J. Assist. Reprod. Genet. 2020, 37, 2189–2198. [Google Scholar] [CrossRef]
- Ray, K.I.; Nicolaides, K.; Pais, R.; Zmuidinaite, R.; Keshavarz, S.; Poon, L.; Butler, S. The Importance of Gestational Age in First Trimester, Maternal Urine MALDI-Tof MS Screening Tests for Down Syndrome. Ann. Proteomics Bioinforma. 2019, 3, 10–17. [Google Scholar] [CrossRef]
- Sharara, F.; Butler, S.A.; Pais, R.J.; Zmuidinaite, R.; Keshavarz, S.; Iles, R.K. BESST, a Non-Invasive Computational Tool for Embryo Selection Using Mass Spectral Profiling of Embryo Culture Media. EMJ Repro Health 2019, 5, 59–60. [Google Scholar]
- Campbell, A.; Fishel, S.; Bowman, N.; Duffy, S.; Sedler, M.; Hickman, C.F.L. Modelling a Risk Classification of Aneuploidy in Human Embryos Using Non-Invasive Morphokinetics. Reprod. Biomed. Online 2013, 26, 477–485. [Google Scholar] [CrossRef]
- Scriven, P.N. Towards a Better Understanding of Preimplantation Genetic Screening for Aneuploidy: Insights from a Virtual Trial for Women under the Age of 40 When Transferring Embryos One at a Time. Reprod. Biol. Endocrinol. 2017, 15, 49. [Google Scholar] [CrossRef]
- Dong, C.; Wei, P.; Jian, X.; Gibbs, R.; Boerwinkle, E.; Wang, K.; Liu, X. Comparison and Integration of Deleteriousness Prediction Methods for Nonsynonymous SNVs in Whole Exome Sequencing Studies. Hum. Mol. Genet. 2015, 24, 2125–2137. [Google Scholar] [CrossRef] [PubMed]
- Montenegro, L.R.; Lerário, A.M.; Nishi, M.Y.; Jorge, A.A.L.; Mendonca, B.B. Performance of Mutation Pathogenicity Prediction Tools on Missense Variants Associated with 46,XY Differences of Sex Development. Clinics 2021, 76, e2052. [Google Scholar] [CrossRef] [PubMed]
- Seaby, E.G.; Pengelly, R.J.; Ennis, S. Exome Sequencing Explained: A Practical Guide to Its Clinical Application. Brief. Funct. Genomics 2016, 15, 374–384. [Google Scholar] [CrossRef] [PubMed]
- Huppert, A.; Katriel, G. Mathematical Modelling and Prediction in Infectious Disease Epidemiology. Clin. Microbiol. Infect. 2013, 19, 999–1005. [Google Scholar] [CrossRef] [PubMed]
- Paulson, R.J. Mathematics Should Clarify, Not Obfuscate: An Inaccurate and Misleading Calculation of the Cost-Effectiveness of Preimplantation Genetic Testing for Aneuploidy. Fertil. Steril. 2019, 111, 1113–1114. [Google Scholar] [CrossRef]
- Cohen, D.P.A.; Martignetti, L.; Robine, S.; Barillot, E.; Zinovyev, A.; Calzone, L. Mathematical Modelling of Molecular Pathways Enabling Tumour Cell Invasion and Migration. PLoS Comput. Biol. 2015, 11, e1004571. [Google Scholar] [CrossRef]
- Telikani, A.; Gandomi, A.H.; Tahmassebi, A.; Banzhaf, W. Evolutionary Machine Learning: A Survey. ACM Comput. Surv 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Kotsiantis, S.B. Supervised Machine Learning: A Review of Classification Techniques. Informatica 2007, 31, 249–268. [Google Scholar] [CrossRef]
- Le, N.Q.K.; Ho, Q.-T. Deep Transformers and Convolutional Neural Network in Identifying DNA N6-Methyladenine Sites in Cross-Species Genomes. Methods 2022, 204, 199–206. [Google Scholar] [CrossRef]
- Tng, S.S.; Le, N.Q.K.; Yeh, H.-Y.; Chua, M.C.H. Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks. J. Proteome Res. 2022, 21, 265–273. [Google Scholar] [CrossRef]
- Olson, R.S.; Urbanowicz, R.J.; Andrews, P.C.; Lavender, N.A.; Kidd, L.C.; Moore, J.H. Automating Biomedical Data Science Through Tree-Based Pipeline Optimization. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2016; Volume 9597, pp. 123–137. ISBN 9783319312033. [Google Scholar]
- Le, T.T.; Fu, W.; Moore, J.H. Scaling Tree-Based Automated Machine Learning to Biomedical Big Data with a Feature Set Selector. Bioinformatics 2020, 36, 250–256. [Google Scholar] [CrossRef] [PubMed]
- Matejka, J.; Fitzmaurice, G. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, 6–11 May 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Siegel, R.L.; Miller, K.D.; Jemal, A. Cancer Statistics, 2019. CA. Cancer J. Clin. 2019, 69, 7–34. [Google Scholar] [CrossRef] [PubMed]
- Morris, J.S.; Brown, P.J.; Herrick, R.C.; Baggerly, K.A.; Coombes, K.R.; Morris jeffmo, J.S. Bayesian Analysis of Mass Spectrometry Proteomics Data Using Wavelet Based Functional Mixed Models. Biometrics 2008, 2, 479–489. [Google Scholar] [CrossRef] [PubMed]
- Eberhard, O. Voit Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Schlatter, R.; Schmich, K.; Avalos Vizcarra, I.; Scheurich, P.; Sauter, T.; Borner, C.; Ederer, M.; Merfort, I.; Sawodny, O. ON/OFF and beyond—A Boolean Model of Apoptosis. PLoS Comput. Biol. 2009, 5, e1000595. [Google Scholar] [CrossRef]
- Rateitschak, K.; Kaderali, L.; Wolkenhauer, O.; Jaster, R. Autocrine TGF-β/ZEB/MicroRNA-200 Signal Transduction Drives Epithelial-Mesenchymal Transition: Kinetic Models Predict Minimal Drug Dose to Inhibit Metastasis. Cell. Signal. 2016, 28, 861–870. [Google Scholar] [CrossRef]
- Fumiã, H.F.; Martins, M.L. Boolean Network Model for Cancer Pathways: Predicting Carcinogenesis and Targeted Therapy Outcomes. PLoS ONE 2013, 8, e69008. [Google Scholar] [CrossRef]
- Arellano, A.M.; Dai, W.; Wang, S.; Jiang, X.; Ohno-Machado, L. Privacy Policy and Technology in Biomedical Data Science. Annu. Rev. Biomed. Data Sci. 2018, 1, 115–129. [Google Scholar] [CrossRef]
Modelling Technique | Description | Application | Requirements |
---|---|---|---|
Statistical | Scoring and probability functions that assumes a distribution shape or behaviour. | Continuous Quantification | Data for parameter estimation. Depend on sample size. |
Kinetic | Solving of systems of nonlinear differential equations. Do not assume any behaviour. Instead relies on rate laws of processes such as chemical reactions. | Binary Classification | Requires reported or estimated kinetic parameter. Do not depend on sample size. |
Logical | Solving of logical equations based on predefined rules for each component. Assumes asynchronous or synchronous update schemes. | Binary Classification | Requires relational knowledge of its components. Do not depend on sample size. |
Regression | Fitting of an assumed mathematical equation on data. Often are used models that describe a particular assumed data behaviour such as linear, polynomial, exponential, and logistic. | Binary Classification | Data for model fitting. Depend on sample size. |
Random Forests | Supervised machine leaning algorithm based on averaging multiple generated decision trees. | Binary Classification | Data for model training and validation. Requires large datasets |
Support Vector Machines | Supervised machine leaning algorithm based on clustering algorithms such as principal component analyses. | Binary Classification | Data for model training and validation. Requires large datasets |
Neural Networks | Supervised machine leaning algorithm based on defining a set of neuron and layers as model components. Assumes all possible relational interactions between neurons. | Binary Classification | Data for model training and validation. Requires large datasets |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pais, R.J. Predictive Modelling in Clinical Bioinformatics: Key Concepts for Startups. BioTech 2022, 11, 35. https://doi.org/10.3390/biotech11030035
Pais RJ. Predictive Modelling in Clinical Bioinformatics: Key Concepts for Startups. BioTech. 2022; 11(3):35. https://doi.org/10.3390/biotech11030035
Chicago/Turabian StylePais, Ricardo J. 2022. "Predictive Modelling in Clinical Bioinformatics: Key Concepts for Startups" BioTech 11, no. 3: 35. https://doi.org/10.3390/biotech11030035
APA StylePais, R. J. (2022). Predictive Modelling in Clinical Bioinformatics: Key Concepts for Startups. BioTech, 11(3), 35. https://doi.org/10.3390/biotech11030035