Decoding Lung Cancer Radiogenomics: A Custom Clustering/Classification Methodology to Simultaneously Identify Important Imaging Features and Relevant Genes
Abstract
:Featured Application
Abstract
1. Introduction
2. Materials and Methods
2.1. Overview
2.2. Data Collection
2.3. Data Preprocessing
2.4. Algorithm Overview
- (1)
- Initial clusters were generated for genetic groupings using a few test methods:
- Binary Matrix Factorization: Binary matrix factorization was used to generate initial clusters, which generated a large m × n matrix of multiple genes as columns per subject where genetic activation was displayed as 0/1 for ‘not activated’ or ‘activated’.
- Random Assignment: Initial clusters were also randomly initialized such that initial clusters of genetic groups would have equal distributions of patients in each.
- K-Means: Initial clusters were also grouped using a k-means clustering process across the 100 columns to generate a set of 5 initial clusters.
- (2)
- Next, a deep learning image classification algorithm was run on the medical imaging data for each subject with the outcome pointed at the generated genetic clusters. Standard Accuracy, AUC, and loss were generated to identify accuracy of model. Loss metric utilized a proposed deep clustering metric (Ratio of Training Loss/Testing Loss). This proposed algorithm used single-slice CT data for each patient, where each CT slice was confirmed to have lung tumor present.
- Kernel Normalization: Standard CNN kernel with gradient calculation and sliding technique was used to identify initial features. DICOM data are unique in that the values of the Hounsfield units from CT scans or MRI intensity values often vary from pixel to pixel. Standard feature selection and edge detection techniques often evaluate the gradient between pixels. To ensure that small fluctuations in DICOM pixel data do not generate false-positive features based on perceived large changes in the gradient between neighboring pixels, the pixels will first be averaged for each receptive field according to the kernel. Gaussian blur is avoided to not lose info and edge detection is avoided on the raw data to not generate false edges or messy images.
- Pooling and Final Fully Connected Layers: Each of these shapes was then fed into a pooling layer. A standard nonlinear function (ReLU) was applied.
- Activation Function: These feature maps were passed through a final layer to determine class scores. A final activation function such as a Softmax function would be used to normalize the input into a probability distribution based on Luce’s choice axiom
- Output: Thus, the final output of a model, similar to other model outputs, was a ranked probability score and corresponding classification that identified whether a series of shapes identified from a DICOM image corresponded to one class or another based on the presence or absence of similar shapes in other images of that class.
- (3)
- The loss, accuracy, and AUC were then used to generate new clusters and move groupings to retrain model. To do this prediction results for each patient were generated according to trained image classification model. If a patient was misclassified, they were moved into a new cluster. The image classification model was then re-run on these new clusters to identify new loss, accuracy, and AUC metrics. This process was repeated until the model could achieve at least a 90% accuracy on the data.
2.5. Outcomes
2.6. Statistical Analysis
3. Results
3.1. Initial Genetic Clusters and Cancer Severity Data
3.2. Model Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kratzer, T.B.; Bandi, P.; Freedman, N.D.; Smith, R.A.; Travis, W.D.; Jemal, A.; Siegel, R.L. Lung cancer statistics, 2023. Cancer 2024, 130, 1330–1348. [Google Scholar] [CrossRef] [PubMed]
- Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
- Crosby, D.; Bhatia, S.; Brindle, K.M.; Coussens, L.M.; Dive, C.; Emberton, M.; Esener, S.; Fitzgerald, R.C.; Gambhir, S.S.; Kuhn, P.; et al. Early detection of cancer. Science 2022, 375, eaay9040. [Google Scholar] [CrossRef] [PubMed]
- Loomans-Kropp, H.A.; Umar, A. Cancer prevention and screening: The next step in the era of precision medicine. NPJ Precis. Oncol. 2019, 3, 3. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Hamilton, W. Cancer diagnosis in primary care. Br. J. Gen. Pr. 2010, 60, 121–128. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Pulumati, A.; Pulumati, A.; Dwarakanath, B.S.; Verma, A.; Papineni, R.V.L. Technological advancements in cancer diagnostics: Improvements and limitations. Cancer Rep. 2023, 6, e1764. [Google Scholar] [CrossRef]
- Archer, J.M.; Truong, M.T.; Shroff, G.S.; Godoy, M.C.B.; Marom, E.M. Imaging of Lung Cancer Staging. Semin. Respir. Crit. Care Med. 2022, 43, 862–873. [Google Scholar] [CrossRef]
- Walls, G.M.; Osman, S.O.S.; Brown, K.H.; Butterworth, K.T.; Hanna, G.G.; Hounsell, A.R.; McGarry, C.K.; Leijenaar, R.T.H.; Lambin, P.; Cole, A.J.; et al. Radiomics for Predicting Lung Cancer Outcomes Following Radiotherapy: A Systematic Review. Clin. Oncol. 2022, 34, e107–e122. [Google Scholar] [CrossRef]
- Avanzo, M.; Stancanello, J.; Pirrone, G.; Sartor, G. Radiomics and deep learning in lung cancer. Strahlenther. Onkol. 2020, 196, 879–887. [Google Scholar] [CrossRef]
- Lander, E.S.; Linton, L.M.; Birren, B.; Nusbaum, C.; Zody, M.C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; et al. Initial sequencing and analysis of the human genome. Nature 2001, 409, 860–921. [Google Scholar]
- Koh, D.-M.; Papanikolaou, N.; Bick, U.; Illing, R.; Kahn, C.E.; Kalpathi-Cramer, J.; Matos, C.; Martí-Bonmatí, L.; Miles, A.; Mun, S.K.; et al. Artificial intelligence and machine learning in cancer imaging. Commun. Med. 2022, 2, 133. [Google Scholar] [CrossRef]
- Cerami, E.; Gao, J.; Dogrusoz, U.; Gross, B.E.; Sumer, S.O.; Aksoy, B.A.; Jacobsen, A.; Byrne, C.J.; Heuer, M.L.; Larsson, E.; et al. The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012, 2, 401–404. [Google Scholar]
- Rubio-Perez, C.; Tamborero, D.; Schroeder, M.P.; Antolín, A.A.; Deu-Pons, J.; Perez-Llamas, C.; Mestres, J.; Gonzalez-Perez, A.; Lopez-Bigas, N. In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. Cancer Cell 2015, 27, 382–396. [Google Scholar]
- Borczuk, A.C.; Toonkel, R.L.; Powell, C.A. Genomics of lung cancer. Proc. Am. Thorac. Soc. 2009, 6, 152–158. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Zhang, T.; Joubert, P.; Ansari-Pour, N.; Zhao, W.; Hoang, P.H.; Lokanga, R.; Moye, A.L.; Rosenbaum, J.; Gonzalez-Perez, A.; Martínez-Jiménez, F.; et al. Genomic and evolutionary classification of lung cancer in never smokers. Nat. Genet. 2021, 53, 1348–1359. [Google Scholar] [CrossRef] [PubMed]
- Restrepo, J.C.; Dueñas, D.; Corredor, Z.; Liscano, Y. Advances in Genomic Data and Biomarkers: Revolutionizing NSCLC Diagnosis and Treatment. Cancers 2023, 15, 3474. [Google Scholar] [CrossRef]
- Jansen, R.W.; Van Amstel, P.; Martens, R.M.; Kooi, I.E.; Wesseling, P.; De Langen, A.J.; Menke-Van der Houven van Oordt, C.W.; Jansen, B.H.E.; Moll, A.C.; Dorsman, J.C.; et al. Non-invasive tumor genotyping using radiogenomic biomarkers, a systematic review and oncology-wide pathway analysis. Oncotarget 2018, 9, 20134–20155. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Rosenstein, B.S.; West, C.M.; Bentzen, S.M.; Alsner, J.; Andreassen, C.N.; Azria, D.; Barnett, G.C.; Baumann, M.; Burnet, N.; Chang-Claude, J.; et al. Radiogenomics: Radiobiology enters the era of big data and team science. Int. J. Radiat. Oncol. Biol. Phys. 2014, 89, 709–713. [Google Scholar] [CrossRef]
- Nie, K.; Al-Hallaq, H.; Li, X.A.; Benedict, S.H.; Sohn, J.W.; Moran, J.M. NCTN Assessment on Current Applications of Radiomics in Oncology. Int. J. Radiat. Oncol. Biol. Phys. 2019, 104, 302–315. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Duan, T.; Zhang, Y.; Weng, S.; Xu, H.; Ren, Y.; Zhang, Z.; Han, X. Radiogenomics: A key component of precision cancer medicine. Br. J. Cancer 2023, 129, 741–753. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Xu, Y.; Hosny, A.; Zeleznik, R.; Parmar, C.; Coroller, T.; Franco, I.; Mak, R.H.; Aerts, H.J. Deep Learning Predicts Lung Cancer Treatment Response from Serial Medical Imaging. Clin. Cancer Res. 2019, 25, 3266–3275. [Google Scholar] [CrossRef]
- Tu, W.; Sun, G.; Fan, L.; Wang, Y.; Xia, Y.; Guan, Y.; Li, Q.; Zhang, D.; Liu, S.; Li, Z. Radiomics signature: A potential and incremental predictor for EGFR mutation status in NSCLC patients, comparison with CT morphology. Lung Cancer 2019, 132, 28–35. [Google Scholar] [CrossRef]
- Jia, T.-Y.; Xiong, J.-F.; Li, X.-Y.; Yu, W.; Xu, Z.-Y.; Cai, X.-W.; Ma, J.-C.; Ren, Y.-C.; Larsson, R.; Zhang, J.; et al. Identifying EGFR mutations in lung adenocarcinoma by noninvasive imaging using radiomics features and random forest modeling. Eur. Radiol. 2019, 29, 4742–4750. [Google Scholar] [CrossRef] [PubMed]
- Nishino, M. Radiomics-based Cluster Groups to Predict Clinical-Pathologic and Genomic Characteristics of Stage I Lung Adenocarcinoma. Radiology 2022, 303, 673–674. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Nair, V.S.; Gevaert, O.; Davidzon, G.; Napel, S.; Graves, E.E.; Hoang, C.D.; Shrager, J.B.; Quon, A.; Rubin, D.L.; Plevritis, S.K. Prognostic PET 18F-FDG uptake imaging features are associated with major oncogenomic alterations in patients with resected non-small cell lung cancer. Cancer Res. 2012, 72, 3725–3734. [Google Scholar] [CrossRef]
- Gandhi, Z.; Gurram, P.; Amgai, B.; Lekkala, S.P.; Lokhandwala, A.; Manne, S.; Mohammed, A.; Koshiya, H.; Dewaswala, N.; Desai, R.; et al. Artificial Intelligence and Lung Cancer: Impact on Improving Patient Outcomes. Cancers 2023, 15, 5236. [Google Scholar] [CrossRef]
- Berenguer, R.; del Rosario Pastor-Juan, M.; Canales-Vazquez, J.; Castro-García, M.; Villas, M.V.; Masilla Legorburo, F.; Sabater, S. Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters. Radiology 2018, 288, 407–415. [Google Scholar] [CrossRef]
- Caramella, C.; Allorant, A.; Orlhac, F.; Bidault, F.; Asselain, B.; Ammari, S.; Jaranowski, P.; Moussier, A.; Balleyguier, C.; Lassau, N.; et al. Can we trust the calculation of texture indices of CT images? A phantom study. Med. Phys. 2018, 45, 1529–1536. [Google Scholar] [CrossRef]
- Nguyen, B.; Fong, C.; Luthra, A.; Smith, S.A.; DiNatale, R.G.; Nandakumar, S.; Walch, H.; Chatila, W.K.; Madupuri, R.; Kundra, R.; et al. Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell 2022, 185, 563–575.e11. [Google Scholar] [CrossRef]
- Clark, K.; Vendt, B.; Smith, K.; Freymann, J.; Kirby, J.; Koppel, P.; Moore, S.; Phillips, S.; Maffitt, D.; Pringle, M.; et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. J. Digit. Imaging 2013, 26, 1045–1057. [Google Scholar] [CrossRef]
- National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC). The Clinical Proteomic Tumor Analysis Consortium Lung Squamous Cell Carcinoma Collection (CPTAC-LSCC) (Version 15) [Data Set]; The Cancer Imaging Archive: Little Rock, AR, USA, 2018. [Google Scholar] [CrossRef]
- Kirk, S.; Lee, Y.; Kumar, P.; Filippini, J.; Albertina, B.; Watson, M.; Rieger-Christ, K.; Lemmerman, J. The Cancer Genome Atlas Lung Squamous Cell Carcinoma Collection (TCGA-LUSC) (Version 4) [Data Set]; The Cancer Imaging Archive: Little Rock, AR, USA, 2016. [Google Scholar] [CrossRef]
- Tomczak, K.; Czerwińska, P.; Wiznerowicz, M. Review The Cancer Genome Atlas (TCGA): An immeasurable source of knowledge. Contemp. Oncol. 2015, 19, A68–A77. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Silva, T.C.; Colaprico, A.; Olsen, C.; D’Angelo, F.; Bontempi, G.; Ceccarelli, M.; Noushmehr, H. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Research 2016, 5, 1542. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Fedorov, A.; Longabaugh, W.J.; Pot, D.; Clunie, D.A.; Pieper, S.; Aerts, H.J.; Homeyer, A.; Lewis, R.; Akbarzadeh, A.; Bontempi, D.; et al. NCI imaging data commons. Cancer Res. 2021, 81, 4188. [Google Scholar]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [PubMed]
- Zhao, Z.; Zhao, J.; Song, K.; Hussain, A.; Du, Q.; Dong, Y.; Liu, J.; Yang, X. Joint DBN and Fuzzy C-Means unsupervised deep clustering for lung cancer patient stratification. Eng. Appl. Artif. Intell. 2020, 91, 103571. [Google Scholar] [CrossRef]
- Li, S.; Han, H.; Sui, D.; Hao, A.; Qin, H. A Novel Radiogenomics Framework for Genomic and Image Feature Correlation using Deep Learning. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018. [Google Scholar] [CrossRef]
- Buda, M.; AlBadawy, E.A.; Saha, A.; Mazurowski, M.A. Deep Radiogenomics of Lower-Grade Gliomas: Convolutional Neural Networks Predict Tumor Genomic Subtypes Using MR Images. Radiol. Artif. Intell. 2020, 2, e180050. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Hoivik, E.A.; Hodneland, E.; Dybvik, J.A.; Wagner-Larsen, K.S.; Fasmer, K.E.; Berg, H.F.; Halle, M.K.; Haldorsen, I.S.; Krakstad, C. A radiogenomics application for prognostic profiling of endometrial cancer. Commun. Biol. 2021, 4, 1363. [Google Scholar] [CrossRef]
- Xia, T.; Kumar, A.; Fulham, M.; Feng, D.; Wang, Y.; Kim, E.Y.; Jung, Y.; Kim, J. Fused feature signatures to probe tumour radiogenomics relationships. Sci. Rep. 2022, 12, 2173. [Google Scholar] [CrossRef]
- Malhotra, J.; Nguyen, D.; Tan, T.; Semeniuk Iii, G.B. Management of KRAS-mutated non-small cell lung cancer. Clin. Adv. Hematol. Oncol. HO 2024, 22, 67–75. [Google Scholar]
- Jänne, P.A.; Riely, G.J.; Gadgeel, S.M.; Heist, R.S.; Ou, S.I.; Pacheco, J.M.; Johnson, M.L.; Sabari, J.K.; Leventakos, K.; Yau, E.; et al. Adagrasib in Non-Small-Cell Lung Cancer Harboring a KRASG12C Mutation. N. Engl. J. Med. 2022, 387, 120–131. [Google Scholar] [CrossRef] [PubMed]
Dataset | Cancer Type | Cancer Location | Total Patients | Imaging Available | Other Data Available |
---|---|---|---|---|---|
CPTAC-LUAD | Adenocarcinoma | Lung | 244 | CT, MR, PT, CR, pathology | Clinical, genomics, proteomics |
TCGA-LUSC | Lung Squamous Cell Carcinoma | Lung | 37 | CT, NM, PT, pathology | Clinical, genomics |
Total | -- | -- | 281 | -- | -- |
Outcome Linked Genetic Mutation | Total Subjects |
---|---|
TP53 | 187 |
KRAS | 157 |
KEAP1 | 148 |
CDKN2 | 101 |
EGFR | 93 |
KDR | 89 |
NTRK | 86 |
STK1 | 71 |
ROS1 | 69 |
SMARCA | 65 |
ALK | 60 |
BRAF | 53 |
LUAD | LUSC | |
---|---|---|
Average Age | 65.33 | 67.25 |
Race | ||
Not reported | 3% | 0% |
Hispanic or Latino | 1% | 2% |
Not Hispanic or Latino | 75% | 65% |
Not reported | 20% | 31% |
Unknown | 1% | 1% |
Gender | ||
Female | 54% | 24% |
Male | 44% | 76% |
Demographic | ||
Not reported | 3% | 0% |
American Indian or Alaska Native | 1% | 0% |
Asian | 1% | 1% |
Black or African American | 10% | 9% |
Not reported | 10% | 17% |
Unknown | 1% | 0% |
White | 75% | 72% |
Stages | ||
Not reported | 45% | 36% |
Stage I | 1% | 1% |
Stage IA | 12% | 11% |
Stage IB | 13% | 17% |
Stage II | 0% | 1% |
Stage IIA | 6% | 9% |
Stage IIB | 9% | 12% |
Stage III | 0% | 1% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Provenzano, D.; Lichtenberger, J.P.; Goyal, S.; Rao, Y.J. Decoding Lung Cancer Radiogenomics: A Custom Clustering/Classification Methodology to Simultaneously Identify Important Imaging Features and Relevant Genes. Appl. Sci. 2025, 15, 4053. https://doi.org/10.3390/app15074053
Provenzano D, Lichtenberger JP, Goyal S, Rao YJ. Decoding Lung Cancer Radiogenomics: A Custom Clustering/Classification Methodology to Simultaneously Identify Important Imaging Features and Relevant Genes. Applied Sciences. 2025; 15(7):4053. https://doi.org/10.3390/app15074053
Chicago/Turabian StyleProvenzano, Destie, John P. Lichtenberger, Sharad Goyal, and Yuan James Rao. 2025. "Decoding Lung Cancer Radiogenomics: A Custom Clustering/Classification Methodology to Simultaneously Identify Important Imaging Features and Relevant Genes" Applied Sciences 15, no. 7: 4053. https://doi.org/10.3390/app15074053
APA StyleProvenzano, D., Lichtenberger, J. P., Goyal, S., & Rao, Y. J. (2025). Decoding Lung Cancer Radiogenomics: A Custom Clustering/Classification Methodology to Simultaneously Identify Important Imaging Features and Relevant Genes. Applied Sciences, 15(7), 4053. https://doi.org/10.3390/app15074053