An Explainable Radiogenomic Framework to Predict Mutational Status of KRAS and EGFR in Lung Adenocarcinoma Patients
Abstract
:1. Introduction
- A realization of a radiogenomic pipeline to characterize CT images of patients with EGFR and KRAS mutations;
- A systematic comparison of feature selection techniques to enhance the accuracy and reliability of the developed models;
- An explainability analysis to corroborate machine learning findings with existing studies in radiology.
2. Related Works
3. Materials and Methods
3.1. Datasets
3.2. Radiomic Features
3.3. Feature Selection
- FS1 was obtained as follows: models for univariate binary logistic regression were trained for the two genes on the input features after being normalized in terms of z-score. The features with a p-value were discarded, and only the uncorrelated features were retained, by exploiting the algorithm described in Bevilacqua et al. [15]. The correlation between quantitative features was estimated with the Pearson correlation coefficient.
- FS2 was obtained in the same way as FS1 but considering only features with a p-value and without exploiting the algorithm [15] to retain the uncorrelated features.
- FS3 and FS4 were obtained in the same way as FS1 and FS2, respectively, but without performing the preliminary z-score normalization.
- FS5 was obtained in the same way as FS3 but applying Benjamini–Hochberg correction for the p-values.
- FS6 was obtained by applying the Mann–Whitney U test to calculate the p-values. Subsequently, the features with a p-value were discarded.
- FS7 was obtained by applying a two-sided t-test after feature normalization to calculate the p-values. Subsequently, the features with a p-value were discarded.
3.4. Prediction of KRAS and EGFR Mutational Status
- FSC1: FS1E, FS1K, FS2E, FS2K, FS3E, FS3K, FS4E, and FS4K were merged to predict both EGFR and KRAS;
- FSC2: FS1E, FS2E, FS3E, and FS4E were merged to predict EGFR; FS1K, FS2K, FS3K, and FS4K were merged to predict KRAS;
- FSC3: FS2E and FS4E were merged to predict EGFR; FS2K and FS4K were merged to predict KRAS;
- FSC4: FS1E and FS2E were merged to predict EGFR, and FS1K BS FS2K were merged to predict KRAS;
- FSC5: FS3E and FS4E were merged to predict EGFR, and FS3K and FS4K were merged to predict KRAS;
- FSC6: FS6E and FS7E were merged to predict EGFR, and FS6K and FS7K were merged to predict KRAS;
- FSC7: FS1E, FS2E, FS3E, FS4E, FS5E, FS6E, and FS7E were merged to predict EGFR, and FS1K, FS2K, FS3K, FS4K, FS5K, FS6K, and FS7K were merged to predict KRAS.
- Experiment 1: We trained the model on the publicly available dataset (D1) and tested on the external dataset (D2) to evaluate its ability to generalize on new data;
- Experiment 2: We performed internal cross-validation by merging D1 and D2 to assess the performance difference with respect to external validation. We employed a 10-fold cross-validation with a stratified method to ensure an even distribution of classes across each fold.
3.5. Radiogenomic Features Explainability
4. Results
4.1. Univariate Analysis
- FS1E resulted in 4 features with a p-value and correlation coefficient ;
- FS1K resulted in 3 features with a p-value and a correlation coefficient ;
- FS2E resulted in 1 feature with a p-value less than 0.01;
- FS2K did not result in the selection of any significant feature.
- FS3E resulted in 14 features with a p-value and a correlation coefficient ;
- FS3K resulted in 13 features with a p-value and a correlation coefficient ;
- FS4E resulted in 13 features with a p-value and a correlation coefficient ;
- FS4K resulted in 12 features with a p-value and a correlation coefficient ;
- FS5E resulted in 14 features with a q-value and a correlation coefficient ;
- FS5K resulted in 14 features with a q-value and a correlation coefficient ;
- FS6E resulted in 3 features with a p-value ;
- FS6K resulted in 6 features with a p-value ;
- FS7E resulted in 2 features with a p-value ;
- FS7K resulted in 3 features with a p-value .
4.2. Predictive Models
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef]
- Travis, W.D. Pathology of lung cancer. Clin. Chest Med. 2011, 32, 669–692. [Google Scholar] [CrossRef]
- Li, C.; Lu, H. Adenosquamous carcinoma of the lung. OncoTargets Ther. 2018, 4829–4835. [Google Scholar] [CrossRef] [Green Version]
- Herbst, R.S.; Morgensztern, D.; Boshoff, C. The biology and management of non-small cell lung cancer. Nature 2018, 553, 446–454. [Google Scholar] [CrossRef] [PubMed]
- Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer statistics, 2022. CA Cancer J. Clin. 2022, 72, 7–33. [Google Scholar] [CrossRef]
- Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 2014, 511, 543. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Del Re, M.; Rofi, E.; Restante, G.; Crucitta, S.; Arrigoni, E.; Fogli, S.; Di Maio, M.; Petrini, I.; Danesi, R. Implications of KRAS mutations in acquired resistance to treatment in NSCLC. Oncotarget 2018, 9, 6630. [Google Scholar] [CrossRef] [Green Version]
- Mok, T.S.; Wu, Y.L.; Thongprasert, S.; Yang, C.H.; Chu, D.T.; Saijo, N.; Sunpaweravong, P.; Han, B.; Margono, B.; Ichinose, Y.; et al. Gefitinib or carboplatin–paclitaxel in pulmonary adenocarcinoma. N. Engl. J. Med. 2009, 361, 947–957. [Google Scholar] [CrossRef]
- Suda, K.; Tomizawa, K.; Mitsudomi, T. Biological and clinical significance of KRAS mutations in lung cancer: An oncogenic driver that contrasts with EGFR mutation. Cancer Metastasis Rev. 2010, 29, 49–60. [Google Scholar] [CrossRef]
- Brunetti, A.; Altini, N.; Buongiorno, D.; Garolla, E.; Corallo, F.; Gravina, M.; Bevilacqua, V.; Prencipe, B. A Machine Learning and Radiomics Approach in Lung Cancer for Predicting Histological Subtype. Appl. Sci. 2022, 12, 5829. [Google Scholar] [CrossRef]
- Altini, N.; Puro, E.; Taccogna, M.G.; Marino, F.; De Summa, S.; Saponaro, C.; Mattioli, E.; Zito, F.A.; Bevilacqua, V. Tumor Cellularity Assessment of Breast Histopathological Slides via Instance Segmentation and Pathomic Features Explainability. Bioengineering 2023, 10, 396. [Google Scholar] [CrossRef]
- Prencipe, B.; Altini, N.; Cascarano, G.D.; Brunetti, A.; Guerriero, A.; Bevilacqua, V. Focal dice loss-based V-Net for liver segments classification. Appl. Sci. 2022, 12, 3247. [Google Scholar] [CrossRef]
- Altini, N.; Marvulli, T.M.; Caputo, M.; Mattioli, E.; Prencipe, B.; Cascarano, G.D.; Brunetti, A.; Tommasi, S.; Bevilacqua, V.; De Summa, S.; et al. Multi-class Tissue Classification in Colorectal Cancer with Handcrafted and Deep Features. In Proceedings of the Intelligent Computing Theories and Application: 17th International Conference, ICIC 2021, Shenzhen, China, 12–15 August 2021; Proceedings, Part I 17; Springer: Cham, Switzerland, 2021; pp. 512–525. [Google Scholar]
- Altini, N.; Marvulli, T.M.; Zito, F.A.; Caputo, M.; Tommasi, S.; Azzariti, A.; Brunetti, A.; Prencipe, B.; Mattioli, E.; De Summa, S.; et al. The role of unpaired image-to-image translation for stain color normalization in colorectal cancer histology classification. Comput. Methods Programs Biomed. 2023, 234, 107511. [Google Scholar] [CrossRef]
- Bevilacqua, V.; Altini, N.; Prencipe, B.; Brunetti, A.; Villani, L.; Sacco, A.; Morelli, C.; Ciaccia, M.; Scardapane, A. Lung segmentation and characterization in COVID-19 patients for assessing pulmonary thromboembolism: An approach based on deep learning and radiomics. Electronics 2021, 10, 2475. [Google Scholar] [CrossRef]
- Rahmim, A.; Schmidtlein, C.R.; Jackson, A.; Sheikhbahaei, S.; Marcus, C.; Ashrafinia, S.; Soltani, M.; Subramaniam, R.M. A novel metric for quantification of homogeneous and heterogeneous tumors in PET for enhanced clinical outcome prediction. Phys. Med. Biol. 2015, 61, 227. [Google Scholar] [CrossRef] [Green Version]
- Zwanenburg, A.; Vallières, M.; Abdalah, M.A.; Aerts, H.J.; Andrearczyk, V.; Apte, A.; Ashrafinia, S.; Bakas, S.; Beukinga, R.J.; Boellaard, R.; et al. The image biomarker standardization initiative: Standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 2020, 295, 328–338. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rizzo, S.; Botta, F.; Raimondi, S.; Origgi, D.; Fanciullo, C.; Morganti, A.G.; Bellomi, M. Radiomics: The facts and the challenges of image analysis. Eur. Radiol. Exp. 2018, 2, 36. [Google Scholar] [CrossRef]
- Lambin, P.; Rios-Velazquez, E.; Leijenaar, R.; Carvalho, S.; Van Stiphout, R.G.; Granton, P.; Zegers, C.M.; Gillies, R.; Boellard, R.; Dekker, A.; et al. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 2012, 48, 441–446. [Google Scholar] [CrossRef] [Green Version]
- Kumar, V.; Gu, Y.; Basu, S.; Berglund, A.; Eschrich, S.A.; Schabath, M.B.; Forster, K.; Aerts, H.J.; Dekker, A.; Fenstermacher, D.; et al. Radiomics: The process and the challenges. Magn. Reson. Imaging 2012, 30, 1234–1248. [Google Scholar] [CrossRef] [Green Version]
- Aerts, H.J.; Velazquez, E.R.; Leijenaar, R.T.; Parmar, C.; Grossmann, P.; Carvalho, S.; Bussink, J.; Monshouwer, R.; Haibe-Kains, B.; Rietveld, D.; et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 2014, 5, 4006. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Thawani, R.; McLane, M.; Beig, N.; Ghose, S.; Prasanna, P.; Velcheti, V.; Madabhushi, A. Radiomics and radiogenomics in lung cancer: A review for the clinician. Lung Cancer 2018, 115, 34–41. [Google Scholar] [CrossRef]
- Reginelli, A.; Nardone, V.; Giacobbe, G.; Belfiore, M.P.; Grassi, R.; Schettino, F.; Del Canto, M.; Grassi, R.; Cappabianca, S. Radiomics as a new frontier of imaging for cancer prognosis: A narrative review. Diagnostics 2021, 11, 1796. [Google Scholar] [CrossRef]
- Jia, T.Y.; Xiong, J.F.; Li, X.Y.; Yu, W.; Xu, Z.Y.; Cai, X.W.; Ma, J.C.; Ren, Y.C.; Larsson, R.; Zhang, J.; et al. Identifying EGFR mutations in lung adenocarcinoma by noninvasive imaging using radiomics features and random forest modeling. Eur. Radiol. 2019, 29, 4742–4750. [Google Scholar] [CrossRef]
- Mei, D.; Luo, Y.; Wang, Y.; Gong, J. CT texture analysis of lung adenocarcinoma: Can Radiomic features be surrogate biomarkers for EGFR mutation statuses. Cancer Imaging 2018, 18, 52. [Google Scholar] [CrossRef] [Green Version]
- Shiri, I.; Maleki, H.; Hajianfar, G.; Abdollahi, H.; Ashrafinia, S.; Hatt, M.; Zaidi, H.; Oveisi, M.; Rahmim, A. Next-generation radiogenomics sequencing for prediction of EGFR and KRAS mutation status in NSCLC patients using multimodal imaging and machine learning algorithms. Mol. Imaging Biol. 2020, 22, 1132–1148. [Google Scholar] [CrossRef] [Green Version]
- Le, N.Q.K.; Kha, Q.H.; Nguyen, V.H.; Chen, Y.C.; Cheng, S.J.; Chen, C.Y. Machine learning-based radiomics signatures for EGFR and KRAS mutations prediction in non-small-cell lung cancer. Int. J. Mol. Sci. 2021, 22, 9254. [Google Scholar] [CrossRef]
- Rios Velazquez, E.; Liu, Y.; Parmar, C.; Narayan, V.; Gillies, R.; Aerts, H. MO-DE-207B-08: Radiomic CT Features Complement Semantic Annotations to Predict EGFR Mutations in Lung Adenocarcinomas. Med. Phys. 2016, 43, 3706. [Google Scholar] [CrossRef]
- Liu, Y.; Kim, J.; Balagurunathan, Y.; Li, Q.; Garcia, A.L.; Stringfield, O.; Ye, Z.; Gillies, R.J. Radiomic features are associated with EGFR mutation status in lung adenocarcinomas. Clin. Lung Cancer 2016, 17, 441–448. [Google Scholar] [CrossRef] [Green Version]
- Pinheiro, G.; Pereira, T.; Dias, C.; Freitas, C.; Hespanhol, V.; Costa, J.L.; Cunha, A.; Oliveira, H.P. Identifying relationships between imaging phenotypes and lung cancer-related mutation status: EGFR and KRAS. Sci. Rep. 2020, 10, 3625. [Google Scholar] [CrossRef] [Green Version]
- Moreno, S.; Bonfante, M.; Zurek, E.; Cherezov, D.; Goldgof, D.; Hall, L.; Schabath, M. A radiogenomics ensemble to predict EGFR and KRAS mutations in NSCLC. Tomography 2021, 7, 154–168. [Google Scholar] [CrossRef]
- Bakr, S.; Gevaert, O.; Echegaray, S.; Ayers, K.; Zhou, M.; Shafiq, M.; Zheng, H.; Benson, J.A.; Zhang, W.; Leung, A.N.; et al. A radiogenomic dataset of non-small cell lung cancer. Sci. Data 2018, 5, 180202. [Google Scholar] [CrossRef] [Green Version]
- Haralick, R.M.; Dinstein, I.; Shanmugam, K. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef] [Green Version]
- Galloway, M.M. Texture analysis using gray level run lengths. Comput. Graph. Image Process. 1975, 4, 172–179. [Google Scholar] [CrossRef]
- Chu, A.; Sehgal, C.M.; Greenleaf, J.F. Use of gray value distribution of run lengths for texture analysis. Pattern Recognit. Lett. 1990, 11, 415–419. [Google Scholar] [CrossRef]
- Tang, X. Texture information in run-length matrices. IEEE Trans. Image Process. 1998, 7, 1602–1609. [Google Scholar] [CrossRef] [Green Version]
- Sun, C.; Wee, W.G. Neighboring gray level dependence matrix for texture classification. Comput. Vision Graph. Image Process. 1983, 23, 341–352. [Google Scholar] [CrossRef]
- Amadasun, M.; King, R. Texural Features Corresponding to Texural Properties. IEEE Trans. Syst. Man Cybern. 1989, 19, 1264–1274. [Google Scholar] [CrossRef]
- Van Griethuysen, J.J.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; Beets-Tan, R.G.; Fillion-Robin, J.C.; Pieper, S.; Aerts, H.J. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017, 77, e104–e107. [Google Scholar] [CrossRef] [Green Version]
- Wang, P.; Pei, X.; Yin, X.P.; Ren, J.L.; Wang, Y.; Ma, L.Y.; Du, X.G.; Gao, B.L. Radiomics models based on enhanced computed tomography to distinguish clear cell from non-clear cell renal cell carcinomas. Sci. Rep. 2021, 11, 13729. [Google Scholar] [CrossRef]
- Liu, Q.; Sun, D.; Li, N.; Kim, J.; Feng, D.; Huang, G.; Wang, L.; Song, S. Predicting EGFR mutation subtypes in lung adenocarcinoma using 18F-FDG PET/CT radiomic features. Transl. Lung Cancer Res. 2020, 9, 549. [Google Scholar] [CrossRef]
- Wu, S.; Meng, J.; Yu, Q.; Li, P.; Fu, S. Radiomics-based machine learning methods for isocitrate dehydrogenase genotype prediction of diffuse gliomas. J. Cancer Res. Clin. Oncol. 2019, 145, 543–550. [Google Scholar] [CrossRef] [Green Version]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar] [CrossRef]
- Shapley, L.S. A value for n-person games. In Contributions to the Theory of Games; Princeton University Press: Princeton, NJ, USA, 1953; Volume 2.28, pp. 307–317. [Google Scholar]
- McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Feature Set | Normalization | p-Value | p-Value Correction | p-Value Threshold | Correlation Coefficient | |||
---|---|---|---|---|---|---|---|---|
Logistic Regression | Mann-Whitney U Test | t-Test | 0.05 | 0.01 | ||||
FS1 | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
FS2 | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |
FS3 | ✗ | ✓ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
FS4 | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ |
FS5 | ✗ | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ |
FS6 | ✗ | ✗ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ |
FS7 | ✓ | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ |
Model | Hyperparameters | Value |
---|---|---|
LR | max_iter | 2500 |
C | 1 | |
penality | [‘l1’, ‘l2’] | |
solver | [‘liblinear’, ‘saga’] | |
SVM | kernel | ‘rbf’ |
C | [0.1, 1, 10] | |
gamma | [0.01, 0.1, 1] | |
RF | n_estimators | [10, 50, 100] |
min_samples_split | [2, 5, 10] | |
min_samples_leaf | [1, 2, 4] | |
criterion | ‘Gini’ | |
max_depth | [5, 10, 15, 20] | |
AB | n_estimators | [50, 100, 150] |
learning_rate | [0.01, 0.1, 1.0] | |
MLP | hidden_layer_sizes | [(5), (10), (20), (30), (5, 5), (10, 5), (10,10), (20,5), (20,10), (30,10), (20, 20), (30,20)] |
activation | ‘Relu’ | |
solver | ‘Adam’ | |
alpha | 0.0001 | |
learning_rate_init | 0.001 | |
max_iter | 2500 | |
early_stopping | False | |
XGB | n_estimators | [50, 100, 150] |
learning_rate | [0.01, 0.1, 1] | |
GBM | n_estimators | [50, 100, 150] |
learning_rate | [0.01, 0.1, 1] | |
GNB | priors | None |
var_smoothing |
Author | Data | Study Population | EGFR Mutation Frequency | KRAS Mutation Frequency | AUC EGFR | AUC KRAS | External Cohort |
---|---|---|---|---|---|---|---|
Shiri et al. [26] | CT, CTD, PET | NSCLC | 25% (T) 25% (V) | 25% (T) 23% (V) | 0.82 | 0.83 | ✓ |
Jia et al. [24] | CT, Clinical | LUAD | 61% (T) 58% (V) | ✗ | 0.82 | ✗ | ✓ |
Rios et al. [28] | CT, Semantic | LUAD | 45% | ✗ | 0.67 | ✗ | ✗ |
Liu et al. [29] | CT, Clinical | LUAD | 46% | ✗ | 0.71 | ✗ | ✗ |
Le et al. [27] | CT, Clinical | NSCLC | 18% | 19% | 0.89 | 0.81 | ✗ |
Pinheiro et al. [30] | CT, Clinical, Semantic | NSCLC | 20% | 23% | 0.58 | 0.50 | ✗ |
Moreno et al. [31] | CT, Clinical | NSCLC | 14% | 24% | 0.70 | 0.71 | ✗ |
Mei et al. [25] | CT, Clinical | LUAD | 51% | ✗ | 0.66 | ✗ | ✗ |
Ours | CT | LUAD | 23% (T) 13% (V) | 25% (T) 24% (V) | 0.69 | 0.82 | ✓ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Prencipe, B.; Delprete, C.; Garolla, E.; Corallo, F.; Gravina, M.; Natalicchio, M.I.; Buongiorno, D.; Bevilacqua, V.; Altini, N.; Brunetti, A. An Explainable Radiogenomic Framework to Predict Mutational Status of KRAS and EGFR in Lung Adenocarcinoma Patients. Bioengineering 2023, 10, 747. https://doi.org/10.3390/bioengineering10070747
Prencipe B, Delprete C, Garolla E, Corallo F, Gravina M, Natalicchio MI, Buongiorno D, Bevilacqua V, Altini N, Brunetti A. An Explainable Radiogenomic Framework to Predict Mutational Status of KRAS and EGFR in Lung Adenocarcinoma Patients. Bioengineering. 2023; 10(7):747. https://doi.org/10.3390/bioengineering10070747
Chicago/Turabian StylePrencipe, Berardino, Claudia Delprete, Emilio Garolla, Fabio Corallo, Matteo Gravina, Maria Iole Natalicchio, Domenico Buongiorno, Vitoantonio Bevilacqua, Nicola Altini, and Antonio Brunetti. 2023. "An Explainable Radiogenomic Framework to Predict Mutational Status of KRAS and EGFR in Lung Adenocarcinoma Patients" Bioengineering 10, no. 7: 747. https://doi.org/10.3390/bioengineering10070747
APA StylePrencipe, B., Delprete, C., Garolla, E., Corallo, F., Gravina, M., Natalicchio, M. I., Buongiorno, D., Bevilacqua, V., Altini, N., & Brunetti, A. (2023). An Explainable Radiogenomic Framework to Predict Mutational Status of KRAS and EGFR in Lung Adenocarcinoma Patients. Bioengineering, 10(7), 747. https://doi.org/10.3390/bioengineering10070747