Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis
Abstract
:Simple Summary
Abstract
1. Introduction
2. Methods
3. Background
3.1. Multimodal Data
3.1.1. Overview of Multimodal Data
3.1.2. Clinical Data
3.1.3. Molecular Data
3.1.4. Image Data
3.2. Machine Learning
3.2.1. Model Evaluation and Performance Metrics
3.2.2. Dimensionality Reduction
Multi-Omics Pre-Processing and Dimensionality Reduction
3.3. Data Integration
Multi-Omics Integration
4. Results
4.1. Sample Size and Cancer Type
4.2. Multimodal Data
4.2.1. Clinical Data
4.2.2. Molecular Data
4.2.3. Image Data
4.3. Data Integration
4.4. Predictive Models for Cancer Prognosis Prediction
4.4.1. Conventional Survival Analysis
4.4.2. Machine Learning-Based Approaches
4.4.3. Mixed Approaches
4.5. Data Sources
5. Discussion
- I.
- Lack of data. Although efforts have been made to extensively collect and provide the scientific community with varied information on cancer (as discussed in Section 4.4), the amount of data is still not enough. The cancer-related data sets found in this review contain hundreds to thousands of observations, but are not as large as data sets from other areas (i.e., finances) that usually contain tens of thousands of observations [113]. According to the curse of dimensionality phenomena, the amount of data required to develop models that ensure statistically reliable results grows exponentially with the dimensionality. Therefore, survival predictive models would improve not only by increasing the sample size but also the follow-up time of patients.
- II.
- Only a few multimodal data sets are publicly available. Access to most existing multimodal data is reserved for the hospitals or research centres that own the data. A change in the data privacy legislation and ensuring the privacy of sensitive medical data by computing on encrypted data is paramount to promoting predictive analysis of private databases [114].
- III.
- Heterogeneity in data. Heterogeneity is present at many levels. Firstly, the data sets gather information from patients of different demographics, types of cancer, and treatments. Although having a representative sample of a population is key to training models with good generalizability, this adds heterogeneity that must be handled properly, especially when it supposes an imbalance in the number of patients of different classes or characteristics (i.e., the information of patients of a rare cancer subtype will likely not be captured by the algorithm). Secondly, the multimodality of data considered in this review inherently entails heterogeneity, and the data sets from the reviewed papers seldom gather all the four main types of data discussed in Section 3. Thirdly, whenever possible, models should be able to deal with missing data. Even within a data set, many patients will contain missing data, as not all of them undergo the same tests and follow-up period. Fourthly, the experimental techniques used to gather the anatomopathological, non-omics, and omics data are extremely varied, which influences the amount and quality of data, the format of the data itself, and the pre-processing needed.
- IV.
- Data integration. The availability of multi-omics data has brought about a breakthrough in information analysis techniques. The complexity of these techniques and the difficulty of choosing the optimal ones for each case requires the collaborative effort of multidisciplinary teams that include experts in the field of Data Analysis, Statistics, and Machine Learning who can guide and support the data treatment and development of robust and generalisable models.
- V.
- Lack of external validation. Another limitation is the lack of independent data sets to externally validate the generated models. External validation is paramount to detect potential issues as bias or overfitting and demonstrate the generalisation capability of models [24]. Many studies do not validate the predictive models with independent data sets, although there is an increasing trend to do so. Fortunately, the accessibility to cancer-related data sets grows bigger every day.
- VI.
- Most studies are single-institution and retrospective, while multi-centric and prospective studies are very scarce. Multi-centric studies often result in data sets with a bigger number of observations, and the data collected tends to more accurately reflect the variety of features displayed by the subjects of a population. Additionally, new data gathered in prospective studies could be useful in the validation of the predictive models trained with the initial data.
- VII.
- Difficulty in comparing state-of-the-art models. Experimental replicability and reproducibility are pivotal topics in ML. There is no unified performance measure used in the reviewed articles, which makes a fair quantitative comparison almost impossible. Further efforts should be made to establish common practices that should be evaded to fairly compare results with the state-of-the-art [115].
- VIII.
- The ‘black box’ problem. Most ML algorithms operate in such a complex manner that understanding how information is processed becomes challenging, thus turning the trained models into opaque systems [116]. Naturally, non-expert audiences cannot completely trust them with tasks as important as the management of patients. However, the rise of Explainable Artificial Intelligence (XAI) is contributing to solving this problem and paving the way for the application of ML models in clinical practice [26].
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
- International Agency for Research on Cancer. Cancer Today: Data Visualization Tools for Exploring the Global Cancer Burden in 2018. Available online: Gco.iarc.fr (accessed on 24 September 2020).
- Wild, C.P. The Global Cancer Burden: Necessity Is the Mother of Prevention. Nat. Rev. Cancer 2019, 19, 123–124. [Google Scholar] [CrossRef] [PubMed]
- Loomans-Kropp, H.A.; Umar, A. Cancer Prevention and Screening: The next Step in the Era of Precision Medicine. Npj Precis. Oncol. 2019, 3, 3. [Google Scholar] [CrossRef] [Green Version]
- Wild, C.P.; Espina, C.; Bauld, L.; Bonanni, B.; Brenner, H.; Brown, K.; Dillner, J.; Forman, D.; Kampman, E.; Nilbert, M.; et al. Cancer Prevention Europe. Mol. Oncol. 2019, 13, 528–534. [Google Scholar] [CrossRef]
- Ahmed, A.A.; Abedalthagafi, M. Cancer Diagnostics: The Journey from Histomorphology to Molecular Profiling. Oncotarget 2016, 7, 58696–58708. [Google Scholar] [CrossRef] [Green Version]
- Falzone, L.; Salomone, S.; Libra, M. Evolution of Cancer Pharmacological Treatments at the Turn of the Third Millennium. Front. Pharmacol. 2018, 9, 1300. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Warner, J.L. A Review of Precision Oncology Knowledgebases for Determining the Clinical Actionability of Genetic Variants. Front. Cell Dev. Biol. 2020, 8, 48. [Google Scholar] [CrossRef] [Green Version]
- Doherty, G.J.; Petruzzelli, M.; Beddowes, E.; Ahmad, S.S.; Caldas, C.; Gilbertson, R.J. Cancer Treatment in the Genomic Era. Annu. Rev. Biochem. 2019, 88, 247–280. [Google Scholar] [CrossRef]
- Zhu, W.; Xie, L.; Han, J.; Guo, X. The Application of Deep Learning in Cancer Prognosis Prediction. Cancers 2020, 12, 603. [Google Scholar] [CrossRef] [Green Version]
- Gress, D.M.; Edge, S.B.; Greene, F.L.; Washington, M.K.; Asare, E.A.; Brierley, J.D.; Byrd, D.R.; Compton, C.C.; Jessup, J.M.; Winchester, D.P.; et al. Principles of Cancer Staging. In AJCC Cancer Staging Manual; Amin, M.B., Edge, S.B., Greene, F.L., Byrd, D.R., Brookland, R.K., Washington, M.K., Gershenwald, J.E., Compton, C.C., Hess, K.R., Sullivan, D.C., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 3–30. ISBN 978-3-319-40617-6. [Google Scholar]
- Wang, P.; Li, Y.; Reddy, C.K. Machine Learning for Survival Analysis: A Survey. ACM Comput. Surv. 2017, 51, 1–36. [Google Scholar] [CrossRef]
- Maji, P. Recent Advances in Multimodal Big Data Analysis for Cancer Diagnosis. CSI Trans. 2019, 7, 227–231. [Google Scholar] [CrossRef]
- Goel, M.K.; Khanna, P.; Kishore, J. Understanding Survival Analysis: Kaplan-Meier Estimate. Int. J. Ayurveda Res. 2010, 1, 274–278. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Peto, R.; Peto, J. Asymptotically Efficient Rank Invariant Test Procedures. J. R. Stat. Soc. Ser. A 1972, 135, 185–198. [Google Scholar] [CrossRef]
- Mantel, N. Evaluation of Survival Data and Two New Rank Order Statistics Arising in Its Consideration. Cancer Chemother. Rep. 1966, 50, 163–170. [Google Scholar]
- Cox, D.R. Regression Models and Life-Tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
- Bewick, V.; Cheek, L.; Ball, J. Statistics Review 12: Survival Analysis. Crit. Care 2004, 8, 389–394. [Google Scholar] [CrossRef] [Green Version]
- Gao, Y.; Zhou, R.; Lyu, Q. Multiomics and Machine Learning in Lung Cancer Prognosis. J. Thorac. Dis. 2020, 12, 4531–4535. [Google Scholar] [CrossRef]
- Burki, T.K. Predicting Lung Cancer Prognosis Using Machine Learning. Lancet Oncol. 2016, 17, e421. [Google Scholar] [CrossRef]
- Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine Learning Applications in Cancer Prognosis and Prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef] [Green Version]
- Nicora, G.; Vitali, F.; Dagliati, A.; Geifman, N.; Bellazzi, R. Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools. Front. Oncol. 2020, 10, 1030. [Google Scholar] [CrossRef]
- Tufail, A.B.; Ma, Y.-K.; Kaabar, M.K.A.; Martínez, F.; Junejo, A.R.; Ullah, I.; Khan, R. Deep Learning in Cancer Diagnosis and Prognosis Prediction: A Minireview on Challenges, Recent Trends, and Future Directions. Comput. Math. Methods Med. 2021, 2021, 9025470. [Google Scholar] [CrossRef] [PubMed]
- Cruz, J.A.; Wishart, D.S. Applications of Machine Learning in Cancer Prediction and Prognosis. Cancer Inform. 2006, 2, 59–77. [Google Scholar] [CrossRef]
- Okser, S.; Pahikkala, T.; Aittokallio, T. Genetic Variants and Their Interactions in Disease Risk Prediction—Machine Learning and Network Perspectives. BioData Min. 2013, 6, 5. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Wu, F.X.; Ngom, A. A Review on Machine Learning Principles for Multi-View Biological Data Integration. Brief. Bioinform. 2018, 19, 325–340. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hasin, Y.; Seldin, M.; Lusis, A. Multi-Omics Approaches to Disease. Genome Biol. 2017, 18, 1–15. [Google Scholar] [CrossRef]
- Clancy, S. Genetic Mutation. Nat. Educ. 2008, 1, 187–188. [Google Scholar]
- Yi, K.; Ju, Y.S. Patterns and Mechanisms of Structural Variations in Human Cancer. Exp. Mol. Med. 2018, 50, 1–11. [Google Scholar] [CrossRef] [Green Version]
- van Dijk, E.L.; Jaszczyszyn, Y.; Naquin, D.; Thermes, C. The Third Revolution in Sequencing Technology. Trends Genet. 2018, 34, 666–681. [Google Scholar] [CrossRef]
- Rauluseviciute, I.; Drabløs, F.; Rye, M.B. DNA Methylation Data by Sequencing: Experimental Approaches and Recommendations for Tools and Pipelines for Data Analysis. Clin. Epigenetics 2019, 11, 193. [Google Scholar] [CrossRef] [Green Version]
- Taft, R.J.; Pang, K.C.; Mercer, T.R.; Dinger, M.; Mattick, J.S. Non-Coding RNAs: Regulators of Disease: Non-Coding RNAs: Regulators of Disease. J. Pathol. 2010, 220, 126–139. [Google Scholar] [CrossRef] [PubMed]
- Boellner, S.; Becker, K.-F. Reverse Phase Protein Arrays—Quantitative Assessment of Multiple Biomarkers in Biopsies for Clinical Use. Microarrays 2015, 4, 98–114. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Orakpoghenor, O.; Avazi, D.O.; Markus, T.P.; Olaolu, O.S. A Short Review of Immunochemistry. Immunogenet. Open Access 2018, 3, 122. [Google Scholar]
- Matsuda, K. Chapter Two-PCR-Based Detection Methods for Single-Nucleotide Polymorphism or Mutation: Real-Time PCR and Its Substantial Contribution Toward Technological Refinement. In Advances in Clinical Chemistry; Makowski, G.S., Ed.; Elsevier: Amsterdam, The Netherlands, 2017; Volume 80, pp. 45–72. [Google Scholar]
- Fass, L. Imaging and Cancer: A Review. Mol. Oncol. 2008, 2, 115–152. [Google Scholar] [CrossRef] [PubMed]
- Kumar, V.; Gu, Y.; Basu, S.; Berglund, A.; Eschrich, S.A.; Schabath, M.B.; Forster, K.; Aerts, H.J.W.L.; Dekker, A.; Fenstermacher, D.; et al. Radiomics: The Process and the Challenges. Magn. Reson. Imaging 2012, 30, 1234–1248. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhong, G.; Ling, X.; Wang, L.-N. From Shallow Feature Learning to Deep Learning: Benefits from the Width and Depth of Deep Architectures. WIREs Data Min. Knowl. Discov. 2019, 9, e1255. [Google Scholar] [CrossRef] [Green Version]
- Niknejad, A.; Petrovic, D. Introduction to Computational Intelligence Techniques and Areas of Their Applications in Medicine. Med. Appl. Artif. Intell. 2013, 51, 2113–2119. [Google Scholar]
- Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; DePristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A Guide to Deep Learning in Healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef]
- Tan, P.-N.; Steinbach, M.; Kumar, V. Data Mining Introduction; The People Post and Telecommunications Press: Beijing, China, 2006. [Google Scholar]
- Raschka, S. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arXiv 2018, arXiv:1811.12808. [Google Scholar]
- Venkatesh, B.; Anuradha, J. A Review of Feature Selection and Its Methods. Cybern. Inf. Technol. 2019, 19, 3–26. [Google Scholar] [CrossRef] [Green Version]
- Zebari, R.; Abdulazeez, A.M.; Zeebaree, D.Q.; Zebari, D.A.; Saeed, J.N. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. J. Appl. Sci. Technol. Trends 2020, 1, 56–70. [Google Scholar] [CrossRef]
- Sharma, N.; Saroha, K. A Novel Dimensionality Reduction Method for Cancer Dataset Using PCA and Feature Ranking. In Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, India, 10–13 August 2015; pp. 2261–2264. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Adossa, N.; Khan, S.; Rytkönen, K.T.; Elo, L.L. Computational Strategies for Single-Cell Multi-Omics Integration. Comput. Struct. Biotechnol. J. 2021, 19, 2588–2596. [Google Scholar] [CrossRef] [PubMed]
- Mirza, B.; Wang, W.; Wang, J.; Choi, H.; Chung, N.C.; Ping, P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes 2019, 10, 87. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bersanelli, M.; Mosca, E.; Remondini, D.; Giampieri, E.; Sala, C.; Castellani, G.; Milanesi, L. Methods for the Integration of Multi-Omics Data: Mathematical Aspects. BMC Bioinform. 2016, 17, S15. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liew, A.W.-C.; Law, N.-F.; Yan, H. Missing Value Imputation for Gene Expression Data: Computational Techniques to Recover Missing Data from Available Information. Brief. Bioinform. 2011, 12, 498–513. [Google Scholar] [CrossRef] [Green Version]
- Vivian, J.; Eizenga, J.M.; Beale, H.C.; Vaske, O.M.; Paten, B. Bayesian Framework for Detecting Gene Expression Outliers in Individual Samples. JCO Clin. Cancer Inform. 2020, 4, 160–170. [Google Scholar] [CrossRef]
- Reel, P.S.; Reel, S.; Pearson, E.; Trucco, E.; Jefferson, E. Using Machine Learning Approaches for Multi-Omics Data Analysis: A Review. Biotechnol. Adv. 2021, 49, 107739. [Google Scholar] [CrossRef]
- Rappoport, N.; Shamir, R. Multi-Omic and Multi-View Clustering Algorithms: Review and Cancer Benchmark. Nucleic Acids Res. 2018, 46, 10546–10562. [Google Scholar] [CrossRef]
- Subramanian, I.; Verma, S.; Kumar, S.; Jere, A.; Anamika, K. Multi-Omics Data Integration, Interpretation, and Its Application. Bioinform. Biol. Insights 2020, 14, 117793221989905. [Google Scholar] [CrossRef] [Green Version]
- Cantini, L.; Zakeri, P.; Hernandez, C.; Naldi, A.; Thieffry, D.; Remy, E.; Baudot, A. Benchmarking Joint Multi-Omics Dimensionality Reduction Approaches for the Study of Cancer. Nat. Commun. 2021, 12, 124. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Chaudhary, K.; Garmire, L.X. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front. Genet. 2017, 8, 84. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gligorijević, V.; Pržulj, N. Methods for Biological Data Integration: Perspectives and Challenges. J. R. Soc. Interface 2015, 12, 20150571. [Google Scholar] [CrossRef] [PubMed]
- Picard, M.; Scott-Boyer, M.-P.; Bodein, A.; Périn, O.; Droit, A. Integration Strategies of Multi-Omics Data for Machine Learning Analysis. Comput. Struct. Biotechnol. J. 2021, 19, 3735–3746. [Google Scholar] [CrossRef]
- Ritchie, M.D.; Holzinger, E.R.; Li, R.; Pendergrass, S.A.; Kim, D. Methods of Integrating Data to Uncover Genotype–Phenotype Interactions. Nat. Rev. Genet. 2015, 16, 85–97. [Google Scholar] [CrossRef] [PubMed]
- Chaudhary, K.; Poirion, O.B.; Lu, L.; Garmire, L.X. Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer. Clin. Cancer Res. 2018, 24, 1248–1259. [Google Scholar] [CrossRef] [Green Version]
- Xie, G.; Dong, C.; Kong, Y.; Zhong, J.F.; Li, M.; Wang, K. Group Lasso Regularized Deep Learning for Cancer Prognosis from Multi-Omics and Clinical Features. Genes 2019, 10, 240. [Google Scholar] [CrossRef] [Green Version]
- Altenbuchinger, M.; Weihs, A.; Quackenbush, J.; Grabe, H.J.; Zacharias, H.U. Gaussian and Mixed Graphical Models as (Multi-)Omics Data Analysis Tools. Biochim. Biophys. Acta (BBA)-Gene Regul. Mech. 2020, 1863, 194418. [Google Scholar] [CrossRef]
- Zierer, J.; Pallister, T.; Tsai, P.-C.; Krumsiek, J.; Bell, J.T.; Lauc, G.; Spector, T.D.; Menni, C.; Kastenmüller, G. Exploring the Molecular Basis of Age-Related Disease Comorbidities Using a Multi-Omics Graphical Model. Sci. Rep. 2016, 6, 37646. [Google Scholar] [CrossRef] [Green Version]
- Huh, R.; Yang, Y.; Jiang, Y.; Shen, Y.; Li, Y. SAME-Clustering: Single-Cell Aggregated Clustering via Mixture Model Ensemble. Nucleic Acids Res. 2020, 48, 86–95. [Google Scholar] [CrossRef]
- Hoadley, K.A.; Yau, C.; Wolf, D.M.; Cherniack, A.D.; Tamborero, D.; Ng, S.; Leiserson, M.D.M.; Niu, B.; McLellan, M.D.; Uzunangelov, V.; et al. Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin. Cell 2014, 158, 929–944. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cabassi, A.; Kirk, P.D.W. Multiple Kernel Learning for Integrative Consensus Clustering of Omic Datasets. Bioinformatics 2020, 36, 4789–4796. [Google Scholar] [CrossRef] [PubMed]
- Zhu, X.; Yao, J.; Luo, X.; Xiao, G.; Xie, Y.; Gazdar, A.; Huang, J. Lung Cancer Survival Prediction from Pathological Images and Genetic Data—An Integration Study. In Proceedings of the International Symposium on Biomedical Imaging, Prague, Czech Republic, 13–16 June 2016; pp. 1173–1176. [Google Scholar] [CrossRef]
- Cheng, J.; Zhang, J.; Han, Y.; Wang, X.; Ye, X.; Meng, Y.; Parwani, A.; Han, Z.; Feng, Q.; Huang, K. Integrative Analysis of Histopathological Images and Genomic Data Predicts Clear Cell Renal Cell Carcinoma Prognosis. Cancer Res. 2017, 77, e91–e100. [Google Scholar] [CrossRef] [Green Version]
- Candido dos Reis, F.J.; Wishart, G.C.; Dicks, E.M.; Greenberg, D.; Rashbass, J.; Schmidt, M.K.; van den Broek, A.J.; Ellis, I.O.; Green, A.; Rakha, E.; et al. An Updated PREDICT Breast Cancer Prognostication and Treatment Benefit Prediction Model with Independent Validation. Breast Cancer Res. 2017, 19, 58. [Google Scholar] [CrossRef] [PubMed]
- Sperduto, P.W.; Yang, T.J.; Beal, K.; Pan, H.; Brown, P.D.; Bangdiwala, A.; Shanley, R.; Yeh, N.; Gaspar, L.E.; Braunstein, S.; et al. Estimating Survival in Patients with Lung Cancer and Brain Metastases an Update of the Graded Prognostic Assessment for Lung Cancer Using Molecular Markers (Lung-MolGPA). JAMA Oncol. 2017, 3, 827–831. [Google Scholar] [CrossRef]
- Elwood, M.; Tin, S.T.; Tawfiq, E.; Marshall, R.J.; Phung, T.M.; Lawrenson, R.; Campbell, I.; Harvey, V. A New Predictive Model for Breast Cancer Survival in New Zealand: Development, Internal and External Validation, and Comparison with the Nottingham Prognostic Index. J. Glob. Oncol. 2018, 4, 227s. [Google Scholar] [CrossRef]
- Matsuo, K.; Purushotham, S.; Jiang, B.; Mandelbaum, R.S.; Takiuchi, T.; Liu, Y.; Roman, L.D. Survival Outcome Prediction in Cervical Cancer: Cox Models vs Deep-Learning Model. Am. J. Obstet. Gynecol. 2019, 220, 381.e1–381.e14. [Google Scholar] [CrossRef]
- Mohebian, M.R.; Marateb, H.R.; Mansourian, M.; Mañanas, M.A.; Mokarian, F. A Hybrid Computer-Aided-Diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning. Comput. Struct. Biotechnol. J. 2017, 15, 75–85. [Google Scholar] [CrossRef]
- Obrzut, B.; Kusy, M.; Semczuk, A.; Obrzut, M.; Kluska, J. Prediction of 5-Year Overall Survival in Cervical Cancer Patients Treated with Radical Hysterectomy Using Computational Intelligence Methods. BMC Cancer 2017, 17, 840. [Google Scholar] [CrossRef]
- Zhu, B.; Song, N.; Shen, R.; Arora, A.; Machiela, M.J.; Song, L.; Landi, M.T.; Ghosh, D.; Chatterjee, N.; Baladandayuthapani, V.; et al. Integrating Clinical and Multiple Omics Data for Prognostic Assessment across Human Cancers. Sci. Rep. 2017, 7, 16954. [Google Scholar] [CrossRef] [Green Version]
- Sun, D.; Li, A.; Tang, B.; Wang, M. Integrating Genomic Data and Pathological Images to Effectively Predict Breast Cancer Clinical Outcome. Comput. Methods Programs Biomed. 2018, 161, 45–53. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Lv, C.; Jin, Y.; Cheng, G.; Fu, Y.; Yuan, D.; Tao, Y.; Guo, Y.; Ni, X.; Shi, T. Deep Learning-Based Multi-Omics Data Integration Reveals Two Prognostic Subtypes in High-Risk Neuroblastoma. Front. Genet. 2018, 9, 477. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhao, M.; Tang, Y.; Kim, H.; Hasegawa, K. Machine Learning with K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients with Breast Cancer. Cancer Inform. 2018, 17, 1176935118810215. [Google Scholar] [CrossRef]
- Cheerla, A.; Gevaert, O. Deep Learning with Multimodal Representation for Pancancer Prognosis Prediction. Bioinformatics 2019, 35, i446–i454. [Google Scholar] [CrossRef] [Green Version]
- Ferroni, P.; Zanzotto, F.M.; Riondino, S.; Scarpato, N.; Guadagni, F.; Roselli, M. Breast Cancer Prognosis Using a Machine Learning Approach. Cancers 2019, 11, 328. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jing, B.; Zhang, T.; Wang, Z.; Jin, Y.; Liu, K.; Qiu, W.; Ke, L.; Sun, Y.; He, C.; Hou, D.; et al. A Deep Survival Analysis Method Based on Ranking. Artif. Intell. Med. 2019, 98, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Sun, D.; Wang, M.; Li, A. A Multimodal Deep Neural Network for Human Breast Cancer Prognosis Prediction by Integrating Multi-Dimensional Data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 16, 841–850. [Google Scholar] [CrossRef]
- Tapak, L.; Shirmohammadi-Khorram, N.; Amini, P.; Alafchi, B.; Hamidi, O.; Poorolajal, J. Prediction of Survival and Metastasis in Breast Cancer Patients Using Machine Learning Classifiers. Clin. Epidemiol. Glob. Health 2019, 7, 293–299. [Google Scholar] [CrossRef] [Green Version]
- Baek, B.; Lee, H. Prediction of Survival and Recurrence in Patients with Pancreatic Cancer by Integrating Multi-Omics Data. Sci. Rep. 2020, 10, 18951. [Google Scholar] [CrossRef]
- Boeri, C.; Chiappa, C.; Galli, F.; Berardinis, V.D.; Bardelli, L.; Carcano, G.; Rovera, F. Machine Learning Techniques in Breast Cancer Prognosis Prediction: A Primary Evaluation. Cancer Med. 2020, 9, 3234–3243. [Google Scholar] [CrossRef]
- Choi, Y.S.; Ahn, S.S.; Chang, J.H.; Kang, S.G.; Kim, E.H.; Kim, S.H.; Jain, R.; Lee, S.K. Machine Learning and Radiomic Phenotyping of Lower Grade Gliomas: Improving Survival Prediction. Eur. Radiol. 2020, 30, 3834–3842. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Li, A.; He, J.; Wang, M. A Novel MKL Method for GBM Prognosis Prediction by Integrating Histopathological Image and Multi-Omics Data. IEEE J. Biomed. Health Inform. 2020, 24, 171–179. [Google Scholar] [CrossRef] [PubMed]
- Arya, N.; Saha, S. Multi-Modal Classification for Human Breast Cancer Prognosis Prediction: Proposal of Deep-Learning Based Stacked Ensemble Model. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 19, 1032–1041. [Google Scholar] [CrossRef]
- Tong, L.; Mitchel, J.; Chatlin, K.; Wang, M.D. Deep Learning Based Feature-Level Integration of Multi-Omics Data for Breast Cancer Patients Survival Analysis. BMC Med. Inform. Decis. Mak. 2020, 20, 225. [Google Scholar] [CrossRef]
- Owens, A.R.; McInerney, C.E.; Prise, K.M.; McArt, D.G.; Jurek-Loughrey, A. Novel Deep Learning-Based Solution for Identification of Prognostic Subgroups in Liver Cancer (Hepatocellular Carcinoma). BMC Bioinform. 2021, 22, 563. [Google Scholar] [CrossRef]
- Malik, V.; Kalakoti, Y.; Sundar, D. Deep Learning Assisted Multi-Omics Integration for Survival and Drug-Response Prediction in Breast Cancer. BMC Genom. 2021, 22, 214. [Google Scholar] [CrossRef]
- Zhao, L.; Dong, Q.; Luo, C.; Wu, Y.; Bu, D.; Qi, X.; Luo, Y.; Zhao, Y. DeepOmix: A Scalable and Interpretable Multi-Omics Deep Learning Framework and Application in Cancer Survival Analysis. Comput. Struct. Biotechnol. J. 2021, 19, 2719–2725. [Google Scholar] [CrossRef]
- Hassanzadeh, H.R.; Wang, M.D. An Integrated Deep Network for Cancer Survival Prediction Using Omics Data. Front. Big Data 2021, 4, 41. [Google Scholar] [CrossRef]
- Zhang, X.; Xing, Y.; Sun, K.; Guo, Y. Omiembed: A Unified Multi-Task Deep Learning Framework for Multi-Omics Data. Cancers 2021, 13, 3047. [Google Scholar] [CrossRef]
- Chharia, A.; Kumar, N. Foreseeing Survival Through ‘Fuzzy Intelligence’: A Cognitively-Inspired Incremental Learning Based de Novo Model for Breast Cancer Prognosis by Multi-Omics Data Fusion. Lect. Notes Comput. Sci. 2021, 12928, 231–242. [Google Scholar] [CrossRef]
- Yousefi, S.; Amrollahi, F.; Amgad, M.; Dong, C.; Lewis, J.E.; Song, C.; Gutman, D.A.; Halani, S.H.; Vega, J.E.V.; Brat, D.J.; et al. Predicting Clinical Outcomes from Large Scale Cancer Genomic Profiles with Deep Survival Models. Sci. Rep. 2017, 7, 11707. [Google Scholar] [CrossRef] [Green Version]
- Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized Treatment Recommender System Using a Cox Proportional Hazards Deep Neural Network. BMC Med. Res. Methodol. 2018, 18, 24. [Google Scholar] [CrossRef] [PubMed]
- Mobadersany, P.; Yousefi, S.; Amgad, M.; Gutman, D.A.; Barnholtz-Sloan, J.S.; Velázquez Vega, J.E.; Brat, D.J.; Cooper, L.A.D. Predicting Cancer Outcomes from Histology and Genomics Using Convolutional Networks. Proc. Natl. Acad. Sci. USA 2018, 115, E2970–E2979. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Huang, Z.; Zhan, X.; Xiang, S.; Johnson, T.S.; Helm, B.; Yu, C.Y.; Zhang, J.; Salama, P.; Rizkalla, M.; Han, Z.; et al. Salmon: Survival Analysis Learning with Multi-Omics Neural Networks on Breast Cancer. Front. Genet. 2019, 10, 166. [Google Scholar] [CrossRef] [Green Version]
- Wang, S.; Liu, Z.; Rong, Y.; Zhou, B.; Bai, Y.; Wei, W.; Wei, W.; Wang, M.; Guo, Y.; Tian, J. Deep Learning Provides a New Computed Tomography-Based Prognostic Biomarker for Recurrence Prediction in High-Grade Serous Ovarian Cancer. Radiother. Oncol. 2019, 132, 171–177. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shao, W.; Wang, T.; Sun, L.; Dong, T.; Han, Z.; Huang, Z.; Zhang, J.; Zhang, D.; Huang, K. Multi-Task Multi-Modal Learning for Joint Diagnosis and Prognosis of Human Cancers. Med. Image Anal. 2020, 65, 101795. [Google Scholar] [CrossRef]
- Chen, R.J.; Lu, M.Y.; Wang, J.; Williamson, D.F.K.; Rodig, S.J.; Lindeman, N.I.; Mahmood, F. Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis. IEEE Trans. Med. Imaging 2020, 4, 757–770. [Google Scholar] [CrossRef]
- Hao, J.; Kosaraju, S.C.; Tsaku, N.Z.; Song, D.H.; Kang, M. PAGE-Net: Interpretable and Integrative Deep Learning for Survival Analysis Using Histopathological Images and Genomic Data. Pac. Symp. Biocomput. 2020, 25, 355–366. [Google Scholar]
- Ning, Z.; Pan, W.; Chen, Y.; Xiao, Q.; Zhang, X.; Luo, J.; Wang, J.; Zhang, Y. Integrative Analysis of Cross-Modal Features for the Prognosis Prediction of Clear Cell Renal Cell Carcinoma. Bioinformatics 2020, 36, 2888–2895. [Google Scholar] [CrossRef]
- Chai, H.; Zhou, X.; Zhang, Z.; Rao, J.; Zhao, H.; Yang, Y. Integrating Multi-Omics Data through Deep Learning for Accurate Cancer Prognosis Prediction. Comput. Biol. Med. 2021, 134, 104481. [Google Scholar] [CrossRef]
- Vale-Silva, L.A.; Rohr, K. Long-Term Cancer Survival Prediction Using Multimodal Deep Learning. Sci. Rep. 2021, 11, 13505. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Zhang, X.; Dai, D.-Q. Defusion: A Denoised Network Regularization Framework for Multi-Omics Integration. Brief. Bioinform. 2021, 22, bbab057. [Google Scholar] [CrossRef] [PubMed]
- Poirion, O.B.; Jing, Z.; Chaudhary, K.; Huang, S.; Garmire, L.X. DeepProg: An Ensemble of Deep-Learning and Machine-Learning Models for Prognosis Prediction Using Multi-Omics Data. Genome Med. 2021, 13, 112. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Wang, J.; Lu, J.; Su, L.; Wang, C.; Huang, Y.; Zhang, X.; Zhu, X. Robust Prognostic Subtyping of Muscle-Invasive Bladder Cancer Revealed by Deep Learning-Based Multi-Omics Data Integration. Front. Oncol. 2021, 11, 689626. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Gerstein, M.; Snyder, M. RNA-Seq: A Revolutionary Tool for Transcriptomics. Nat. Rev. Genet. 2009, 10, 57–63. [Google Scholar] [CrossRef]
- Sealfon, S.C.; Chu, T.T. RNA and DNA Microarrays. Methods Mol. Biol. 2011, 671, 3–34. [Google Scholar] [CrossRef] [PubMed]
- Ramasamy, A.; Chowdhury, S. Big Data Quality Dimensions: A Systematic Literature Review. J. Inf. Syst. Technol. Manag. 2020, 17, e202017003. [Google Scholar] [CrossRef]
- Bos, J.W.; Lauter, K.; Naehrig, M. Private Predictive Analysis on Encrypted Medical Data. J. Biomed. Inform. 2014, 50, 234–243. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Errica, F.; Podda, M.; Bacciu, D.; Micheli, A. A Fair Comparison of Graph Neural Networks for Graph Classification. arXiv 2019, arXiv:1912.09893. [Google Scholar]
- Zednik, C. Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence. Philos. Technol. 2021, 34, 265–288. [Google Scholar] [CrossRef] [Green Version]
- Wickremasinghe, D.; Kuruvilla, S.; Mays, N.; Avan, B.I. Taking Knowledge Users’ Knowledge Needs into Account in Health: An Evidence Synthesis Framework. Health Policy Plan. 2016, 31, 527–537. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Morrison, A.; Polisena, J.; Husereau, D.; Moulton, K.; Clark, M.; Fiander, M.; Mierzwinski-Urban, M.; Clifford, T.; Hutton, B.; Rabb, D. The Effect of English-Language Restriction on Systematic Review-Based Meta-Analyses: A Systematic Review of Empirical Studies. Int. J. Technol. Assess. Health Care 2012, 28, 138–144. [Google Scholar] [CrossRef] [PubMed]
| First Author & Reference | Year | Country | Study Design 1 | Sample Size 2 | Cancer Type | Clinical Data | Molecular Data | Image Data | Predictive Analytics | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AP | Other | Omics | Non-Omics | |||||||||||
| G | E | T | P | |||||||||||
| Zhu [68] | 2016 | USA | RCS | 111 patients | LUAD | ✔ | ✔ | Conventional Statistics | ||||||
| Cheng [69] | 2017 | USA, China | RCS | 410 patients | ccRCC | ✔ | ✔ | |||||||
| Dos Reis [70] | 2017 | UK | MC RCS | 5738 patients | Breast cancer | ✔ | ✔ | |||||||
| Sperduto [71] | 2017 | USA | MC RCS | 2186 patients | NSCLC | ✔ | ✔ | ✔ | ||||||
| Elwood [72] | 2018 | New Zealand | MC PCS | 9182 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Matsuo [73] | 2019 | USA | RCS | 768 patients | Cervical cancer | ✔ | ✔ | |||||||
| Mohebian [74] | 2017 | Iran, Spain | SI RCS | 579 patients | Breast cancer | ✔ | ✔ | ✔ | Machine Learning | |||||
| Obrzut [75] | 2017 | Poland | SI RCS | 102 patients | Cervical cancer | ✔ | ✔ | ✔ | ||||||
| Zhu [76] | 2017 | USA | RCS | 3382 samples | 14 types of cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
| Chaudhary [61] | 2018 | USA | RCS | 360 patients | Hepatocellular carcinoma | ✔ | ✔ | |||||||
| Sun [77] | 2018 | China | RCS | 578 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
| Zhang [78] | 2018 | USA, China | RCS | 380 samples | Neuroblastoma | ✔ | ✔ | |||||||
| Zhao [79] | 2018 | USA | MC PCS | 1874 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
| Cheerla [80] | 2019 | USA | MC RCS | 11,160 patients | 20 types of cancer | ✔ | ✔ | ✔ | ✔ | |||||
| Ferroni [81] | 2019 | Italy | SI PCS | 454 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Jing [82] | 2019 | China | MC RCS | 4630 patients | Nasopharyngeal carcinoma | ✔ | ✔ | |||||||
| Matsuo [73] | 2019 | USA | RCS | 768 patients | Cervical cancer | ✔ | ✔ | |||||||
| Sun [83] | 2019 | China | RCS | 1980 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | |||||
| Tapak [84] | 2019 | Iran | RCS | 550 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Baek [85] | 2020 | South Korea | RCS | 177 patients | Pancreatic adenocarcinoma | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
| Boeri [86] | 2020 | Italy | RCS | 610 patients | Breast cancer | ✔ | ✔ | ✔ | Machine Learning | |||||
| Choi [87] | 2020 | South Korea | MC CS-RCS | 205 patients | Glioblastoma multiforme | ✔ | ✔ | ✔ | ✔ | |||||
| Zhang [88] | 2020 | China | RCS | 251 patients | Glioblastoma multiforme | ✔ | ✔ | ✔ | ||||||
| Arya [89] | 2020 | India | RCS | 1980 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
| Tong [90] | 2020 | USA | RCS | ~1000 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Owens [91] | 2021 | UK | RCS | 352 patients | Hepatocellular carcinoma | ✔ | ✔ | |||||||
| Malik [92] | 2021 | India | RCS | 532 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
| Zhao [93] | 2021 | China | RCS | 474 patients | Low-Grade Glioma | ✔ | ✔ | ✔ | ||||||
| Hassanzadeh [94] | 2021 | USA | RCS | 836 patients | 3 types of cancer | ✔ | ✔ | |||||||
| Zhang [95] | 2021 | UK | RCS | 131 patients | 35 types of cancer | ✔ | ✔ | |||||||
| Chharia [96] | 2021 | India | RCS | 1980 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Yousefi [97] | 2017 | USA | RCS | 3323 patients | 5 types of cancer | ✔ | ✔ | ✔ | ✔ | ✔ | Mixed Approach | |||
| Katzman [98] | 2018 | USA | RCS | 1980 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Mobadersany [99] | 2018 | USA | RCS | 769 patients | Gliomas | ✔ | ✔ | ✔ | ||||||
| Huang [100] | 2019 | USA, China | RCS | 583 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | |||||
| Wang [101] | 2019 | China | MC RCS | 245 patients | HGSOC | ✔ | ✔ | |||||||
| Shao [102] | 2020 | China | RCS | 1324 patients | LUSC, breast cancer, LIHC | ✔ | ✔ | |||||||
| Chen [103] | 2020 | USA | RCS | 1186 patients | Glioma and ccRCC | ✔ | ✔ | ✔ | ||||||
| Hao [104] | 2020 | USA | RCS | 447 patients | Glioblastoma multiforme | ✔ | ✔ | ✔ | ||||||
| Ning [105] | 2020 | Germany | RCS | 209 patients | ccRCC | ✔ | ✔ | ✔ | ||||||
| Zhang [95] | 2021 | China | RCS | 454 patients | Bladder cancer | ✔ | ✔ | ✔ | ||||||
| Chai [106] | 2021 | China | RCS | 5032 patients | 15 types of cancer | ✔ | ✔ | ✔ | ||||||
| Vale-Silva [107] | 2021 | Germany | RCS | 11,081 patients | 33 types of cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
| Wang [108] | 2021 | China | RCS | Not specified | 7 types of cancer | ✔ | ✔ | ✔ | ||||||
| Poirion [109] | 2021 | USA | RCS | 10,000 samples | 32 types of cancer | ✔ | ✔ | |||||||
| Subtype of Clinical Data | Variables | Used by | 
|---|---|---|
| Demographic data | Age at diagnosis | [70,71,72,73,74,75,76,77,79,80,81,82,83,84,85,87,89,92,97,98,100,104,107] | 
| Gender | [76,80,82,85,87,89,92,97,98,104,107] | |
| Ethnicity | [72,73,80,89,98,107] | |
| General measures of health status | BMI | [73,75,81,82,98] | 
| Temperature | [98] | |
| Respiration rate | [98] | |
| Systolic and diastolic blood pressure | [73,98] | |
| Heart rate | [73,98] | |
| Menopausal status | [79,81,89,97] | |
| Lifestyle (e.g., smoking habit) | [85] | |
| Prior malignancies | [107] | |
| Presence/absence of comorbidities (e.g., hypercholesterolemia, hypertension, diabetes mellitus, synchronous malignancies, etc.) | [73,75,81,85,89,98,107] | |
| Number of comorbidities | [98] | |
| Risk factors (e.g., high sensitivity to C reactive protein, etc.) | [76,82,98] | |
| Laboratory test results data | Blood cells count (e.g., leukocytes, platelets) | [73,98] | 
| Haemoglobin level | [73,82] | |
| Serum metabolites/enzymes level (e.g., sugar, urea, creatinine, bicarbonate, albumin, lactate dehydrogenase, etc.) | [73,81,82,98] | |
| Surgery-related data | Surgery time | [75] | 
| Median blood lost | [75] | |
| Presence of intraoperative complications | [75] | |
| Type of complications | [75] | |
| Length of hospital stay | [75] | |
| Pathological data | Mode of detection (clinical or screening) | [70] | 
| Cancer type (primary site) | [80,85,107] | |
| Cellularity of tumour content | [79] | |
| Degree of abnormality of cancer cells | [79] | |
| Primary tumour laterality | [79] | |
| Primary tumour size | [70,72,74,75,79,83,86,89] | |
| Presence/absence of multifocal tumours | [86] | |
| Surgery status | [73,79] | |
| Type of surgery | [74,84,89] | |
| Resection extent | [87] | |
| Parametrial involvement (in cervical cancer) | [75] | |
| Skin or chest wall invasion (in breast cancer) | [86] | |
| Lymph node status | [75] | |
| Number of positive lymph nodes | [70,72,83,86,89,92,97] | |
| Lymph node involvement ratio | [74,75] | |
| Lymph-vascular space invasion | [72,75] | |
| Deep stromal invasion | [75] | |
| Histologic type and subtype | [72,73,75,76,84,92,97] | |
| Histological grade | [70,75,76,80,81,83,84,85,86,87,89,92,99,105] | |
| T Stage | [82] | |
| N Stage | [82,92] | |
| M Stage | [86,92,105,107] | |
| Stage (e.g., pTNM, NPI, FIGO staging system) | [73,75,76,79,81,84,85,92,97,101,105,107] | |
| Number of brain metastases | [71] | |
| Presence/absence of distant metastasis at diagnosis | [72] | |
| Therapy-related data | Prior treatment | [89,107] | 
| Radiotherapy (yes/no) | [73,75,85,89,97,98,107] | |
| Chemotherapy (yes/no) | [70,73,79,89,98,107] | |
| Targeted therapy (yes/no) (e.g., hormonal therapy, anti-HER2 therapy, etc.) | [70,74,79,86,89,98,107] | |
| Response to chemotherapy (complete/partial/none) | [86] | |
| Karnofsky Performance Status (KPS) | [71] | 
| Type of Omics Data | Methods | Variables | Used by | 
|---|---|---|---|
| Genomics | WGS WES Targeted sequencing DNA microarrays | Germinal variants | [76] | 
| Somatic point mutations (e.g., SNVs, indels) | [76,85,89,90,92,93,96,103,106,107,108,110] | ||
| Mutational status of genes | [79,90,98,103] | ||
| CNAs | [76,77,78,79,83,88,89,90,92,93,96,97,103,106,107,108,110] | ||
| CNB | [100] | ||
| TMB | [100] | ||
| Epigenomics | DNA methylation arrays Bisulphite sequencing | DNA methylation data | [61,76,77,85,90,91,92,93,94,95,106,107,108,109,110] | 
| Transcriptomics | RNA-Seq RNA microarrays | mRNA levels | [61,68,69,76,77,78,79,80,85,88,89,90,91,92,94,96,97,100,102,103,107,108,109] | 
| miRNA levels | [76,80,85,90,91,92,94,100,107,108,109] | ||
| Gene expression profiles | [83,93,95,104,105,106,110] | ||
| Proteomics | RPPA | Protein expression levels | [77,91,92,97] | 
| Type of Molecular Data | Methods | Variables | Used by | 
|---|---|---|---|
| IHC data | Immuno- histochemical staining | Presence/absence of proteins in tumour tissue (e.g., ER, PR, Ki-67) | [72,74,75,81,84,89,98,100] | 
| Percentage of protein expression in tumour tissue (e.g., ER, Ki-67, etc.) | [86] | ||
| Over-expression of proteins in tumour tissue (e.g., HER-2) | [79] | ||
| Genetic data | PCR-based methods | The molecular subtype of cancer (luminal A, luminal B, HER-2 positive luminal B, non-luminal HER-2 positive, triple-negative) | [74] | 
| Somatic point mutations (e.g., IDH R132H mutation) | [87] | ||
| Mutational status of genes | [71,99] | 
| Methods | Type of Data | Features | Used by | 
|---|---|---|---|
| Image segmentation and hand-crafted features | WSIs | Quantitative image features | [27,28,37,48,54] | 
| ROIs from WSIs | [80,99,103,104,105,107] | ||
| MRI images | Quantitative image features | [87] | |
| CT images | ROIs | [101,105] | 
| First Author & Reference | Predictive Modelling | Validation Technique(s) | Performance Metrics | Model Output | Dimensionality Reduction | External Validation | Model Comparison | 
|---|---|---|---|---|---|---|---|
| Zhu [68] | SuperPC regression | 10-fold CV | HR and Log-rank tests p-value | HR. Dichotomization of patients into high/low-risk and low-risk | ✔ | ||
| Cheng [69] | Lasso–Cox model | 10-fold CV | Log-rank test p-value | Risk index of death | ✔ | ||
| Dos Reis [70] | Multivariate CPH regression within a multivariable fractional polynomial model | No | AUC | Risk index of death at 10-years | ✔ | ✔ | ✔ | 
| Sperduto [71] | Multivariate multiple CPH regression | No | None | Lung-molGPA score | ✔ | ||
| Elwood [72] | Multivariate CPH regression | Bootstrapping for internal and external validation | C-index | Predicted OS (months) at 10 years | ✔ | ✔ | |
| Matsuo [73] | Multivariate CPH regression | 10-fold CV | MAE, C-index | Survival risk index, PFS, and OS | ✔ | 
| First Author & Reference | Predictive Modelling | Validation Technique(s) | Performance Metrics | Model Output | Dimensionality Reduction | External Validation | Model Comparison | 
|---|---|---|---|---|---|---|---|
| Matsuo [73] | DNN | 10-CV | MAE, C-index | Predicted OS and PFS | ✔ | ||
| Mohebian [74] | BDT | Bagging, hold-out and 4-CV | Sn, Sp, Acc, precision, F-score, AUC, MCC, +LR, -LR, DOR, DP, κ | Patient dichotomization | ✔ | ✔ | |
| Obrzut [75] | PNN, MLP, GEP, SVM, RBFNN, and K-means | 10-CV | Acc, Sn, Sp, AUC | Predicted OS at 5 years | ✔ | ||
| Zhu [76] | MOK | Monte Carlo CV | C-index | Predicted overall prognostic score | ✔ | ✔ | ✔ | 
| Chaudhary [61] | DL-based model | 5-CV and 10-CV | C-index, log-rank p-value, and BS | Patient dichotomization | ✔ | ✔ | ✔ | 
| Sun [77] | SimpleMKL | 10-CV | AUC, Acc, precision, MCC, and C-index | Patient dichotomization | ✔ | ✔ | |
| Zhang [78] | ANN, K-means, SVM, and XGBoost | 10-CV | AUC | Predicted OS and patient dichotomization | ✔ | ✔ | ✔ | 
| Zhao [79] | Gradient Boosting, RF, SVM, and ANN | 10-CV | ROC curve, Acc, CS, stability | Patient dichotomization | ✔ | ✔ | |
| Cheerla [80] | DNN | Hold-out | C-index | Predicted OS | ✔ | ||
| Ferroni [81] | MKL based on SVM | 3-CV | AUC, Sn, Sp, F- score, LR, HR, C-index, and Acc | Patient dichotomization | |||
| Jing [82] | DNN | Bootstrapping | C-index | Predicted DFS and patient dichotomization | ✔ | ✔ | |
| Sun [83] | DNN | 10-CV | ROC curve, AUC, Sn, Sp, Acc, precision, MCC | Patient dichotomization | ✔ | ✔ | ✔ | 
| Tapak [84] | NB, RF, AdaBoost, SVM, LS-SVM, AdaBag | Hold-out | Sn, Sp, PPV, NPV, +LR, -LR, Acc | Patient dichotomization | ✔ | ||
| Baek [85] | SVM, LR, L2RR, RF | Hold-out and 5-CV | Acc, AUC, C-index, IBS | Predicted DFS and OS at 5 years | ✔ | ✔ | |
| Boeri [86] | SVM, ANN | 3-CV | Acc, Sn, Sp, AUC | Risk of recurrence and risk of death | ✔ | ||
| Choi [87] | RSF | Bagging | iAUC | Predicted OS and patient dichotomization | ✔ | ✔ | |
| Zhang [88] | MKL based on SVM | 10-CV | AUC | Patient dichotomization | ✔ | ✔ | |
| Arya [89] | Ensemble of CNNs and RF | 10-CV | AUC, Sn, Sp, Acc, precision, MCC | Patient dichotomization | ✔ | ✔ | ✔ | 
| Tong [90] | ANN | 4-CV | C-index | HR | ✔ | ||
| Owens [91] | DL-based model | Not detailed | Silhouette score, log-rank p-value | Patient dichotomization | ✔ | ✔ | |
| Malik [92] | DL-based model | 10-CV | AUC, Acc, Sn, Sp, FPR, F1-Score, MCC, κ | Patient dichotomization | ✔ | ✔ | ✔ | 
| Zhao [93] | ANN | 10-CV | C-index | Patient dichotomization | ✔ | ✔ | |
| Hassanzadez [94] | DL-based model | Hold-out and 5-CV | Acc | Patient dichotomization | ✔ | ✔ | |
| Zhang [95] | DL-based model | Not detailed | C-index, IBS | Predicted OS | ✔ | ✔ | |
| Chharia [96] | DL-based model | 5-CV | Precision, Acc | Probability of survival and patient dichotomization | ✔ | ✔ | 
| First Author & Reference | Predictive Modelling | Validation Technique(s) | Performance Metrics | Model Output | Dimensionality Reduction | External Validation | Model Comparison | 
|---|---|---|---|---|---|---|---|
| Yousefi [97] | DL-CPH | Monte Carlo CV | C-index | Risk index of death, correlated to OS | ✔ | ||
| Katzman [98] | DL-CPH | Bootstrapping, hold-out and 3-CV | C-index | HR | ✔ | ✔ | |
| Mobadersany [99] | DL-CPH | Monte Carlo CV | C-index | HR | ✔ | ||
| Huang [100] | DL-CPH | 5-CV | C-index, log-rank test p-value | HR and patient dichotomization | ✔ | ✔ | |
| Wang [101] | DL-CPH | Not detailed | C-index, AUC, Acc, log-rank test p-value | Risk index of recurrence, correlated to RFS and Patient dichotomization. Recurrence probability in a specific time point | ✔ | ✔ | |
| Shao [102] | Adaboost for diagnosis and CPH for prognosis | 5-CV | C-index, BS | Risk index of death and patient dichotomization | ✔ | ✔ | |
| Chen [103] | DL-CPH | 15-CV | C-index | Patient dichotomization | ✔ | ||
| Hao [104] | DL-CPH | Not detailed | C-index | Patient dichotomization | ✔ | ✔ | |
| Ning [105] | DL-CPH | 10-CV | C-index | Patient dichotomization | ✔ | ||
| Chai [106] | DL-CPH | Not detailed | C-index | Patient dichotomization | ✔ | ✔ | ✔ | 
| Vale-Silva [107] | DNN-based model | Hold-out | Ctd, IBS | Conditional survival probability for 1 to 30 years | ✔ | ✔ | |
| Wang [108] | NMF-CPH | 3-CV | C-index | Survival probability and patient dichotomization | ✔ | ✔ | ✔ | 
| Poirion [109] | Ensemble of DL and SVM models | Hold-out and 5-CV | Log-rank p-value, C-index and Silhouette score | Patient’s risk of death | ✔ | ✔ | ✔ | 
| Zhang [110] | DL-CPH | 10-CV | AUC | Patient dichotomization | ✔ | ✔ | 
| Type of Repository | Repositories & Programs/Studies | Used by | 
|---|---|---|
| Public Repositories | ICGC Data Portal (e.g., Pan-Cancer Atlas Initiative, TCGA Program) | [61,69,76,77,80,83,85,87,88,89,90,91,92,93,94,95,97,99,100,102,103,104,105,106,107,108,109,110] | 
| EGA (e.g., METABRIC Study) | [79,82,83,89,96,98,109] | |
| GDC Data Portal (e.g., TARGET Program) | [61,76,78,95] | |
| GEO | [61,95,106,108,109,110] | |
| COSMIC | [85] | |
| ArrayExpress Archive of Functional Genomics Data | [61] | |
| Institutional databases | N/A | [71,72,73,74,75,81,82,84,86,87,101] | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lobato-Delgado, B.; Priego-Torres, B.; Sanchez-Morillo, D. Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers 2022, 14, 3215. https://doi.org/10.3390/cancers14133215
Lobato-Delgado B, Priego-Torres B, Sanchez-Morillo D. Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers. 2022; 14(13):3215. https://doi.org/10.3390/cancers14133215
Chicago/Turabian StyleLobato-Delgado, Barbara, Blanca Priego-Torres, and Daniel Sanchez-Morillo. 2022. "Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis" Cancers 14, no. 13: 3215. https://doi.org/10.3390/cancers14133215
APA StyleLobato-Delgado, B., Priego-Torres, B., & Sanchez-Morillo, D. (2022). Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers, 14(13), 3215. https://doi.org/10.3390/cancers14133215
 
         
                                                



 
       