Machine Learning Models for Cancer Research: A Narrative Review of Bulk RNA-Seq Applications
Abstract
1. Introduction
Review Methodology
2. Preprocessing of RNA-Seq Data
3. Conventional ML Algorithms in Cancer Studies
| Cancer Type | Model Type | Datasets and Cohort Size | Sample Type | Feature Selection Methods | Model Algorithm | Metrics | Signature Size | External Validation | Ref. |
|---|---|---|---|---|---|---|---|---|---|
| Colorectal cancer | Diagnostic | TCGA-CRC (n = 695), GSE50760 (n = 54), external GSE142279 (n = 40) | Tumor and normal tissues | LASSO, gene expression, correlation and survival analyses | Adaboost, RF, LR, GNB, SVM (final RF) | Internal (TCGA-CRC): RF accuracy 100%, internal (GSE50760): RF accuracy 94.44%, external (GSE142279, 3-gene RF model): accuracy 100%, gene AUC 0.99–1.00 | 3 genes | Yes (GSE142279, RF classifier) | [40] |
| Colorectal cancer | Diagnostic | GSE10950, GSE25070, GSE41328, GSE74602, GSE142279—train (n = 154), validation (n = 66), external GSE21815 (n = 141), GSE106582 (n = 194) | Tumor and normal tissues | DEG, PPI (STRING), LASSO | RF, SVM, ANN, GBM | Internal (balanced train/validation): AUROC > 0.95 for RF, SVM, ANN, GBM, external GSE21815 (SVM): AUROC 0.982, accuracy 0.978, external GSE106582 (RF): AUROC 0.980, accuracy 0.958 | 9 genes | Yes (independent cohorts GSE21815, GSE106582) | [39] |
| Colorectal cancer | Diagnostic | GSE44861 (n = 111), GSE103512 (n = 69), TCGA-COAD/READ (n = 698) | Tumor and normal tissues | MCFS, Boruta, mRMR, LightGBM | SVM, XGBoost, RF, kNN, interpretable rules | 10-fold CV: accuracy 81.2–98.6%, AUC 0.82–1.00 (best models with AUC ≈ 0.99) | 33, 52, 49 genes | No | [43] |
| Lung adenocarcinoma | Diagnostic and prognostic | TCGA-LUAD (n ≈ 500), external GSE7670 (n = 54), GSE30219 (n ≈ 300), GSE31210 (n = 226), GSE50081 (n = 127), GSE37745 (n = 106) | Tumor and normal tissues | LASSO, RF, SVM | KNN, NB, RF, DT, SVM, XGBoost (diagnostic); LASSO–Cox (prognostic) | Diagnostic models: 10-fold CV on training set AUC-ROC 0.95–1.00, AUC-PRC 0.98–1.00, accuracy > 95%; test set AUC-ROC 0.93–1.00, AUC-PRC 0.99–1.00, accuracy > 95%. Prognostic model: time-dependent ROC AUC ~0.78 at 1 year and ~0.73 at 2–4 years | 13 markers in the diagnostic model, 12 genes in the prognostic model | Yes (GSE7670 for diagnostic model; GSE30219, GSE31210, GSE50081, GSE37745 for prognostic model) | [41] |
| Lung adenocarcinoma | Diagnostic | TCGA-LUAD (n = 549), external GSE81089 (n = 54) | Tumor and normal tissues | MI, RFE, RF | SVM, RF | Internal (TCGA test set, 12-gene model): accuracy ≈ 97.99%, balanced accuracy ≈ 90.66%, AUC ≈ 0.993, external (GSE81089): accuracy ≈ 96.29%, balanced accuracy ≈ 94.44%, AUC = 1.00 | 12 genes | Yes (GSE81089) | [42] |
| Multiple tumor types | Classification | Pan-paediatric RNA-seq cohorts covering 52 tumour types and 96 tumour subtypes, external KidsFirst paediatric tumour dataset | Tumor tissues | ANOVA (F-statistic), variance, PCA, RF feature importance | Weighted RF, weighted kNN, ensemble | Overall tumour-type classification: internal accuracy 94–99%, precision~99%, recall~80%, subtype classification: accuracy ≈ 86%. External KidsFirst cohort: precision 98%, recall 77% | 300 transcripts + PCA components | Yes (KidsFirst) | [44] |
| Multiple tumor types | Classification | UCI ML Repository (RNA-Seq HiSeq PANCAN, n = 801) | Tumor tissues | ANOVA F-test, PCA | Decision Trees, SVM, LR, Naive Bayes, KNN, ensembles, wide neural networks | Internal train/validation/test splits: best wide neural network—accuracy 99.834% (validation) and 99.995% (test), several models reach accuracy up to 100% with AUC up to 1.00 (five-class tumour-type classification) | 16,345 features after selection + PCA (95% explained variance) | No | [45] |
| Breast cancer | Classification, Prognosis | TCGA-BRCA (n = 376), CCLE (n = 80) | Tumor tissues and cell lines | Alternative splicing (PSI of cassette exons), differential splicing, Boruta (RF importance) | Semi-supervised RF (cell-to-patient transfer learning), univariate Cox regression | Cell-to-patient RF classifier using a 25-event splicing signature stratified basal-like patients: 5-year DSS HR = 4.87 (95% CI 1.37–17.28), log-rank p = 0.0067 | 25 splicing events | No | [46] |
| Lung squamous cell carcinoma | Prognostic and predictive | TCGA-LUSC (n = 478), external GSE30219, GSE37745, GSE73403 (combined n = 200) | Tumor tissues | LASSO, RF | Multivariate LR | TCGA training set: AUC = 0.967, accuracy 89.9%, TCGA testing set: AUC = 0.956, accuracy 88.9%, external GEO cohorts: classifier separates two stemness subtypes with clearly different OS | 9 genes | Yes (GSE30219, GSE37745, GSE73403) | [47] |
| Pancreatic cancer | Prognostic | TCGA-PAAD (n = 182), GTEx-PAAD (n = 167), external GSE62452 (n = 130), GSE28735 (n = 90) | Tumor and normal tissues | Univariate Cox, LASSO, Gaussian finite mixture model | Multivariate Cox regression, Kaplan–Meier | TCGA-PAAD: 5-gene risk score AUC ≈ 0.75 for OS, external GSE62452 + GSE28735: AUC ≈ 0.91 | 5 genes | Yes (GSE62452, GSE28735) | [48] |
| B-cell lymphoma | Prognostic | GSE10846 (n = 233), GSE23501 (n = 64) | Tumor tissues | Univariate Cox, Mclust, RF | RF survival model | C-index = 0.84 (training), 0.79 (validation) | 54 genes | Yes (GSE23501) | [49] |
| Prostate cancer | Prognostic | RNA-Seq LAPC cohort from Russian patients (n = 73), independent FFPE LAPC cohort (n = 37) | Tumor tissues (fresh frozen and FFPE) | DEGs, statistical tests | LR, LGBM, CatBoost, RF, XGBoost | Best model “CST2 + OCLN + pT” (CatBoost): internal AUC ≈ 1, external FFPE cohort: AUC = 0.863, accuracy 0.81, sensitivity 0.83, specificity 0.79 | 2 genes + clinical feature | Yes (independent FFPE cohort) | [13] |
| Lung adenocarcinoma | Prognostic | TCGA-LUAD (n = 547), external GSE31210, GSE30219, GSE50081 (total n = 369) | Tumor tissues | Univariate Cox, correlation analysis, Lasso, RSF, Elastic Net | Elastic Net Cox model (27-lncRNA risk score) | TCGA-LUAD: C-index = 0.677 (95% CI 0.63–0.73), time-dependent AUC = 0.76 (1 year), 0.72 (2 years), 0.74 (3 years), external GSE31210, GSE30219, GSE50081: C-index > 0.67 and significant OS/DFS separation between risk groups | 27 lncRNAs | Yes (GSE31210, GSE30219, GSE50081) | [50] |
| Clear cell renal cell carcinoma | Prognostic | TCGA-KIRC (n = 614) | Tumor tissues | Univariate Cox, LASSO, multivariate Cox | LASSO-Cox | TCGA-KIRC: C-index = 0.783 (95% CI 0.775–0.816), time-dependent AUC = 0.725 (1 year), 0.718 (3 years), 0.762 (5 years) | 4 lncRNAs | No | [51] |
| Breast cancer | Prognostic | TCGA-BRCA (1113 cases), external GSE20685 (n = 327), IMvigor210, GSE78220 | Tumor tissues | Univariate Cox, RSF, stepwise multivariate Cox | Multivariate Cox | TCGA-BRCA: time-dependent AUC for OS = 0.78 (1 year), 0.67 (3 years), 0.63 (5 years), nomogram C-index ≈ 0.79, external GSE20685: 1-/3-/5-year AUC ≈ 0.81/0.66/0.70 | 5 genes | Yes (GSE20685) | [52] |
| Multiple tumor types | Prognostic and predictive | TCGA with clinical drug response data (five cancer types), 5-FU model: train (n = 58), validation (n = 17), gemcitabine (GCB) model: train (n = 92), validation (n = 28) | Tumor tissues | Clara (OptCluster), RF | RF | Pan-cancer RF models: CV AUC = 0.98 for both 5-FU and GCB, independent validation: 5-FU model AUC 0.56, accuracy 52.9%, GCB model AUC 0.71, accuracy 85.7% | 2 gene clusters | No | [53] |
| Gastroenteropancreatic neuroendocrine tumors | Diagnostic and prognostic | GSE98894 (n = 182), GSE118014 (n = 32) | Tumor tissues | mRMRe | RF, SVM, LDA, GBM, XGB, kNN, CART | RF-based models for hepatic metastasis and primary site: internal accuracy up to 100%, external validation (GSE118014): accuracy > 90% for metastasis classification and >95% for primary-site prediction | 21 genes (9 for metastasis, 12 for primary tumor) | Yes (GSE118014) | [54] |
4. Deep Learning Algorithms for Cancer Diagnostics, Prognostics and Classification
| Model | Model Type | Datasets and Cohort Size | Sample Type | Model Algorithms | Metrics | Interpretability | Key Model Features | External Validation | Reference |
|---|---|---|---|---|---|---|---|---|---|
| DeepCC | Cancer subtyping, classification | CRC: TCGA-CRC train n = 456, 13 external CRC cohorts total n ≈ 3122; BRCA: TCGA-BRCA (n = 517) + multiple external microarray cohorts (n = 230). | Tumor tissues | Feedforward ANN with hidden layers; input: functional spectra from GSEA | CRC subtyping (TCGA-CRC): balanced accuracy > 90%; BRCA intrinsic subtype classification (TCGA-BRCA): balanced accuracy > 80% | Yes (deep features analyzed via Pearson correlation with GSEA scores; clustering identifies biological processes) | Platform-independent; transforms expression to functional spectra for batch effect robustness; single sample prediction with adaptive rescaling; learns hierarchical features | Yes (13 independent CRC datasets; 4 breast cancer datasets) | [58] |
| DeSide | Cellular deconvolution | TCGA tumors samples (n = 7699); merged 12 scRNA-seq datasets rom tumor tissues and cancer cell lines (n = 325,474 single cells) | Tumor tissue and cancer cell lines | DNN with two 7-layer MLPs | DeSide-derived cell-type scores significantly stratify overall survival in multiple TCGA cohorts (Cox p < 0.005). | Yes (integration of biological pathways for feature extraction) | Unified model for multiple solid tumors; predicts non-cancer cells first; handles intra/inter-tumor heterogeneity | Yes (SC_HNSC, SC_GBM, SC_OV, GSE184398) | [63] |
| PCA-AE-Ada | Prognosis | GSE2034 train/validation n = 266, external: GSE4922 n = 133, GSE6532 n = 100, GSE7390 n = 190, GSE11121 n = 182 | Tumor tissues | PCA + stacked autoencoder; concatenated features to AdaBoost | 10 × 5-fold CV on GSE2034 and independent tests on four GEO cohorts: ACC 0.72–0.85, AUC 0.68–0.74 (mean AUC ≈ 0.71), sensitivity 0.68–0.84, specificity 0.55–0.66 | No | Combines linear (PCA) + non-linear (autoencoder) feature learning with ensemble boosting; handles high-dimensional, noisy, heterogeneous data; alleviates class imbalance; data alignment via zero-padding | Yes (4 independent GEO datasets: GSE11121, GSE1456, GSE4922, GSE6532) | [65] |
| TransCUPtomics | Diagnostics, classification | Reference: 20,918 RNA-seq samples (39 cancer types, 55 normal tissues) from TCGA, GTEx, Human Protein Atlas, GSE60052, GSE118014; CUP cohort: 48 patients (37 retrospective, 11 prospective) | Fresh-frozen tumor tissue | VAE + RF + k-NN | Reference set: overall accuracy ≈ 96% for tissue-of-origin prediction (three-fold cross-validation); CUP cohort: tissue of origin predicted in 79% (38/48) cases; in the prospective cohort, 7/8 (87.5%) patients treated according to TransCUPtomics showed objective responses | Yes (VAE latent features via gene weights and GO analysis) | Integrates TOO with gene fusion and variant detection; confidence scoring; handles normal cell contamination; UMAP visualization | Yes (37 retrospective + 11 prospective CUP patients) | [66] |
| Lightweight CNN | Classification | TCGA-BRCA (n = 1208) | Tumor and normal tissues | Lightweight CNN with 2 convolutional layers, max pooling, batch normalization, dense layers | 5-fold CV on TCGA-BRCA: Accuracy 98.8%, Sensitivity 91.4%, Specificity 100%, Precision 100%, F1 = 0.955, AUC = 0.998 | No | Transforms gene expression → 2D images edge detection for visualization; lightweight for small samples and high dimensions; GC normalization, gene filtering | No | [67] |
| TULIP | Classification (primary tumor type prediction) | TCGA (32 types, n = 10,940) | Tumor tissue | 1D-CNN with 2 conv layers, max pooling, 2 FC layers, dropout 10% | Internal train/validation/test splits: overall test accuracy 94.7–97.6%; weighted precision/recall/F1 ≥ 0.92; accuracy > 90% for 16/17 and ≥80% for 28–29/32 tumor types; external CPTAC kidney cohort: all 277 samples classified as kidney cancer | No | Python QC tool (version 3.7.12) for unknown tumor prediction | Yes (CPTAC kidney cancer) | [68] |
| EOSA-CNN | Classification | TCGA-BRCA (n = 1208) | Tumor and normal tissues | Hybrid: CNN optimized by Ebola Optimization Search Algorithm (EOSA) | Balanced accuracy = 0.959, overall accuracy = 98.3%, precision ≈ 0.90, recall/sensitivity ≈ 0.93, F1 ≈ 0.91, Cohen’s kappa ≈ 0.90; class-wise sensitivity and F1 for tumor class ≈ 99% | No | Bio-inspired optimization for high dimensionality and imbalance; outperforms standalone CNN, GA-CNN, WOA-CNN | No | [69] |
| HHWO-DL | Classification | RNA-Seq from 66 paired samples from 55 patients + 11 controls | Paired tumor and normal tissues | Hybrid gene selection: mRMR + HHWO optimization; multi-layer feedforward NN | Mean accuracy ≈ 99%, ROC AUC ≈ 0.99 | No | Hybrid optimization for gene selection reduces dimensionality and avoids local optima | No | [70] |
| CancerSiamese | Classification | TCGA primary tumors (29 types; n = 10,340 samples) and MET500 metastatic cohort (20 types; n = 765 samples); training on 19 primary and 10 metastatic cancer types, testing on disjoint unseen types | Tumor tissues | Siamese CNN with parallel 1D-CNNs joined by similarity metric network; trained on paired samples | Unseen primary tumors: one-shot 6-/8-/10-way accuracy 89.67%, 87.32%, 84.59%; metastatic tumors: ≈63–66% accuracy | Yes (Guided Backpropagation Saliency Maps for gene significance; stepwise greedy forward selection for markers (top 100 primary, 200 metastatic); DAVID for GO/KEGG/Biocarta pathways) | One-shot learning for primary and metastatic tumor types unseen in training; network transfer learning from pretrained 1D-CNN | Yes (unseen cancer types disjoint from training; cross-application) | [71] |
5. Current Trends in the Development of Cancer ML Models Based on Transcriptomic Data
6. Practical Recommendations for Developing Cancer ML Models Based on Bulk RNA-Seq Data
7. Challenges and Future Prospects
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Siegel, R.L.; Giaquinto, A.N.; Jemal, A. Cancer statistics, 2024. CA Cancer J. Clin. 2024, 74, 12–49. [Google Scholar] [CrossRef]
- Proietto, M.; Crippa, M.; Damiani, C.; Pasquale, V.; Sacco, E.; Vanoni, M.; Gilardi, M. Tumor heterogeneity: Preclinical models, emerging technologies, and future applications. Front. Oncol. 2023, 13, 1164535. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Mashock, M.; Tong, Z.; Mu, X.; Chen, H.; Zhou, X.; Zhang, H.; Zhao, G.; Liu, B.; Li, X. Changing Technologies of RNA Sequencing and Their Applications in Clinical Oncology. Front. Oncol. 2020, 10, 447. [Google Scholar] [CrossRef]
- Sager, M.; Yeat, N.C.; Pajaro-Van der Stadt, S.; Lin, C.; Ren, Q.; Lin, J. Transcriptomics in cancer diagnostics: Developments in technology, clinical research and commercialization. Expert Rev. Mol. Diagn. 2015, 15, 1589–1603. [Google Scholar] [CrossRef]
- Farris, S.; Wang, Y.; Ward, J.M.; Dudek, S.M. Optimized Method for Robust Transcriptome Profiling of Minute Tissues Using Laser Capture Microdissection and Low-Input RNA-Seq. Front. Mol. Neurosci. 2017, 10, 185. [Google Scholar] [CrossRef]
- Haque, A.; Engel, J.; Teichmann, S.A.; Lonnberg, T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 2017, 9, 75. [Google Scholar] [CrossRef] [PubMed]
- Stahl, P.L.; Salmen, F.; Vickovic, S.; Lundmark, A.; Navarro, J.F.; Magnusson, J.; Giacomello, S.; Asp, M.; Westholm, J.O.; Huss, M.; et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 2016, 353, 78–82. [Google Scholar] [CrossRef]
- Cancer Genome Atlas Research, N.; Weinstein, J.N.; Collisson, E.A.; Mills, G.B.; Shaw, K.R.; Ozenberger, B.A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J.M. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 2013, 45, 1113–1120. [Google Scholar] [CrossRef] [PubMed]
- Clough, E.; Barrett, T. The Gene Expression Omnibus Database. Methods Mol. Biol. 2016, 1418, 93–110. [Google Scholar] [CrossRef]
- Larranaga, P.; Calvo, B.; Santana, R.; Bielza, C.; Galdiano, J.; Inza, I.; Lozano, J.A.; Armananzas, R.; Santafe, G.; Perez, A.; et al. Machine learning in bioinformatics. Brief. Bioinform. 2006, 7, 86–112. [Google Scholar] [CrossRef]
- Bostanci, E.; Kocak, E.; Unal, M.; Guzel, M.S.; Acici, K.; Asuroglu, T. Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer. Sensors 2023, 23, 80. [Google Scholar] [CrossRef] [PubMed]
- Cheng, Y.; Xu, S.M.; Santucci, K.; Lindner, G.; Janitz, M. Machine learning and related approaches in transcriptomics. Biochem. Biophys. Res. Commun. 2024, 724, 150225. [Google Scholar] [CrossRef]
- Pudova, E.A.; Kobelyatskaya, A.A.; Katunina, I.V.; Snezhkina, A.V.; Fedorova, M.S.; Pavlov, V.S.; Bakhtogarimov, I.R.; Lantsova, M.S.; Kokin, S.P.; Nyushko, K.M.; et al. Lymphatic Dissemination in Prostate Cancer: Features of the Transcriptomic Profile and Prognostic Models. Int. J. Mol. Sci. 2023, 24, 2418. [Google Scholar] [CrossRef]
- Kobelyatskaya, A.A.; Kudryavtsev, A.A.; Kudryavtseva, A.V.; Snezhkina, A.V.; Fedorova, M.S.; Kalinin, D.V.; Pavlov, V.S.; Guvatova, Z.G.; Naberezhnev, P.A.; Nyushko, K.M.; et al. ALDH3A2, ODF2, QSOX2, and MicroRNA-503-5p Expression to Forecast Recurrence in TMPRSS2-ERG-Positive Prostate Cancer. Int. J. Mol. Sci. 2022, 23, 11695. [Google Scholar] [CrossRef]
- Chen, T.; Kabir, M.F. Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data. PLoS ONE 2024, 19, e0302947. [Google Scholar] [CrossRef]
- Bullard, J.H.; Purdom, E.; Hansen, K.D.; Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinform. 2010, 11, 94. [Google Scholar] [CrossRef] [PubMed]
- Hansen, K.D.; Brenner, S.E.; Dudoit, S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 2010, 38, e131. [Google Scholar] [CrossRef]
- Robinson, M.D.; Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010, 11, R25. [Google Scholar] [CrossRef]
- Li, J.; Witten, D.M.; Johnstone, I.M.; Tibshirani, R. Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics 2012, 13, 523–538. [Google Scholar] [CrossRef]
- Johnson, W.E.; Li, C.; Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007, 8, 118–127. [Google Scholar] [CrossRef]
- Nueda, M.J.; Ferrer, A.; Conesa, A. ARSyN: A method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics 2012, 13, 553–566. [Google Scholar] [CrossRef]
- Ben Salem, K.; Ben Abdelaziz, A. Principal Component Analysis (PCA). Tunis. Med. 2021, 99, 383–389. [Google Scholar]
- Hamamoto, R.; Takasawa, K.; Machino, H.; Kobayashi, K.; Takahashi, S.; Bolatkan, A.; Shinkai, N.; Sakai, A.; Aoyama, R.; Yamada, M.; et al. Application of non-negative matrix factorization in oncology: One approach for establishing precision medicine. Brief. Bioinform. 2022, 23, bbac246. [Google Scholar] [CrossRef]
- McConn, J.L.; Lamoureux, C.R.; Poudel, S.; Palsson, B.O.; Sastry, A.V. Optimal dimensionality selection for independent component analysis of transcriptomic data. BMC Bioinform. 2021, 22, 584. [Google Scholar] [CrossRef] [PubMed]
- Han, S.; Wang, N.; Guo, Y.; Tang, F.; Xu, L.; Ju, Y.; Shi, L. Application of Sparse Representation in Bioinformatics. Front. Genet. 2021, 12, 810875. [Google Scholar] [CrossRef]
- Kobak, D.; Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 2019, 10, 5416. [Google Scholar] [CrossRef] [PubMed]
- Aragones, D.G.; Palomino-Segura, M.; Sicilia, J.; Crainiciuc, G.; Ballesteros, I.; Sanchez-Cabo, F.; Hidalgo, A.; Calvo, G.F. Variable selection for nonlinear dimensionality reduction of biological datasets through bootstrapping of correlation networks. Comput. Biol. Med. 2024, 168, 107827. [Google Scholar] [CrossRef]
- Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection. ACM Comput. Surv. 2017, 50, 1–45. [Google Scholar] [CrossRef]
- Lee, J.Y.; Lee, K.S.; Seo, B.K.; Cho, K.R.; Woo, O.H.; Song, S.E.; Kim, E.K.; Lee, H.Y.; Kim, J.S.; Cha, J. Radiomic machine learning for predicting prognostic biomarkers and molecular subtypes of breast cancer using tumor heterogeneity and angiogenesis properties on MRI. Eur. Radiol. 2022, 32, 650–660. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.H.; Lee, C.Y.; Lee, T.Y.; Huang, H.D.; Hsu, J.B.; Chang, T.H. Biomarker Identification through Multiomics Data Analysis of Prostate Cancer Prognostication Using a Deep Learning Model and Similarity Network Fusion. Cancers 2021, 13, 2528. [Google Scholar] [CrossRef]
- Ma, B.; Geng, Y.; Meng, F.; Yan, G.; Song, F. Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method. J. Cancer 2020, 11, 1288–1298. [Google Scholar] [CrossRef] [PubMed]
- Alharbi, F.; Vakanski, A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering 2023, 10, 173. [Google Scholar] [CrossRef] [PubMed]
- Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinform. 2022, 2, 927312. [Google Scholar] [CrossRef]
- Patel, A.J.; Tan, T.M.; Richter, A.G.; Naidu, B.; Blackburn, J.M.; Middleton, G.W. A highly predictive autoantibody-based biomarker panel for prognosis in early-stage NSCLC with potential therapeutic implications. Br. J. Cancer 2022, 126, 238–246. [Google Scholar] [CrossRef]
- Arora, C.; Kaur, D.; Naorem, L.D.; Raghava, G.P.S. Prognostic biomarkers for predicting papillary thyroid carcinoma patients at high risk using nine genes of apoptotic pathway. PLoS ONE 2021, 16, e0259534. [Google Scholar] [CrossRef]
- Yuan, H.; Yan, M.; Zhang, G.; Liu, W.; Deng, C.; Liao, G.; Xu, L.; Luo, T.; Yan, H.; Long, Z.; et al. CancerSEA: A cancer single-cell state atlas. Nucleic Acids Res. 2019, 47, D900–D908. [Google Scholar] [CrossRef]
- Yu, S.H.; Cai, J.H.; Chen, D.L.; Liao, S.H.; Lin, Y.Z.; Chung, Y.T.; Tsai, J.J.P.; Wang, C.C.N. LASSO and Bioinformatics Analysis in the Identification of Key Genes for Prognostic Genes of Gynecologic Cancer. J. Pers. Med. 2021, 11, 1177. [Google Scholar] [CrossRef]
- Torang, A.; Gupta, P.; Klinke, D.J., 2nd. An elastic-net logistic regression approach to generate classifiers and gene signatures for types of immune cells and T helper cell subsets. BMC Bioinform. 2019, 20, 433. [Google Scholar] [CrossRef]
- Vaziri-Moghadam, A.; Foroughmand-Araabi, M.H. Integrating machine learning and bioinformatics approaches for identifying novel diagnostic gene biomarkers in colorectal cancer. Sci. Rep. 2024, 14, 24786. [Google Scholar] [CrossRef] [PubMed]
- Maurya, N.S.; Kushwaha, S.; Vetukuri, R.R.; Mani, A. Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms. Genes 2023, 14, 1836. [Google Scholar] [CrossRef] [PubMed]
- Lin, L.; Bao, Y. Development and validation of machine learning models for diagnosis and prognosis of lung adenocarcinoma, and immune infiltration analysis. Sci. Rep. 2024, 14, 22081. [Google Scholar] [CrossRef]
- Abdelwahab, O.; Awad, N.; Elserafy, M.; Badr, E. A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma. PLoS ONE 2022, 17, e0269126. [Google Scholar] [CrossRef] [PubMed]
- Wei, W.; Li, Y.; Huang, T. Using Machine Learning Methods to Study Colorectal Cancer Tumor Micro-Environment and Its Biomarkers. Int. J. Mol. Sci. 2023, 24, 11133. [Google Scholar] [CrossRef] [PubMed]
- Wallis, F.S.A.; Baker-Hernandez, J.L.; van Tuil, M.; van Hamersveld, C.; Koudijs, M.J.; Verwiel, E.T.P.; Janse, A.; Hiemcke-Jiwa, L.S.; de Krijger, R.R.; Kranendonk, M.E.G.; et al. M&M: An RNA-seq based pan-cancer classifier for paediatric tumours. eBioMedicine 2025, 111, 105506. [Google Scholar] [CrossRef] [PubMed]
- Alanazi, S.A.; Alshammari, N.; Alruwaili, M.; Junaid, K.; Abid, M.R.; Ahmad, F. Integrative analysis of RNA expression data unveils distinct cancer types through machine learning techniques. Saudi J. Biol. Sci. 2024, 31, 103918. [Google Scholar] [CrossRef]
- Villemin, J.P.; Lorenzi, C.; Cabrillac, M.S.; Oldfield, A.; Ritchie, W.; Luco, R.F. A cell-to-patient machine learning transfer approach uncovers novel basal-like breast cancer prognostic markers amongst alternative splice variants. BMC Biol. 2021, 19, 70. [Google Scholar] [CrossRef]
- Lai, J.; Lin, X.; Zheng, H.; Xie, B.; Fu, D. Characterization of stemness features and construction of a stemness subtype classifier to predict survival and treatment responses in lung squamous cell carcinoma. BMC Cancer 2023, 23, 525. [Google Scholar] [CrossRef]
- Zhang, X.; Yang, L.; Zhang, D.; Wang, X.; Bu, X.; Zhang, X.; Cui, L. Prognostic assessment capability of a five-gene signature in pancreatic cancer: A machine learning based-study. BMC Gastroenterol. 2023, 23, 68. [Google Scholar] [CrossRef]
- Mosquera Orgueira, A.; Diaz Arias, J.A.; Cid Lopez, M.; Peleteiro Raindo, A.; Antelo Rodriguez, B.; Aliste Santos, C.; Alonso Vence, N.; Bendana Lopez, A.; Abuin Blanco, A.; Bao Perez, L.; et al. Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling. BMC Cancer 2020, 20, 1017. [Google Scholar] [CrossRef]
- Pan, Y.; Jin, X.; Xu, H.; Hong, J.; Li, F.; Luo, T.; Zeng, J. Developing a prognostic model using machine learning for disulfidptosis related lncRNA in lung adenocarcinoma. Sci. Rep. 2024, 14, 13113, Erratum in Sci. Rep. 2024, 14, 13809. https://doi.org/10.1038/s41598-024-64894-9. [Google Scholar] [CrossRef]
- Chen, R.; Wu, J.; Che, Y.; Jiao, Y.; Sun, H.; Zhao, Y.; Chen, P.; Meng, L.; Zhao, T. Machine learning-driven prognostic analysis of cuproptosis and disulfidptosis-related lncRNAs in clear cell renal cell carcinoma: A step towards precision oncology. Eur. J. Med. Res. 2024, 29, 176. [Google Scholar] [CrossRef]
- Li, J.; Qiao, H.; Wu, F.; Sun, S.; Feng, C.; Li, C.; Yan, W.; Lv, W.; Wu, H.; Liu, M.; et al. A novel hypoxia- and lactate metabolism-related signature to predict prognosis and immunotherapy responses for breast cancer by integrating machine learning and bioinformatic analyses. Front. Immunol. 2022, 13, 998140. [Google Scholar] [CrossRef]
- Clayton, E.A.; Pujol, T.A.; McDonald, J.F.; Qiu, P. Leveraging TCGA gene expression data to build predictive models for cancer drug response. BMC Bioinform. 2020, 21, 364. [Google Scholar] [CrossRef]
- Padwal, M.K.; Basu, S.; Basu, B. Application of Machine Learning in Predicting Hepatic Metastasis or Primary Site in Gastroenteropancreatic Neuroendocrine Tumors. Curr. Oncol. 2023, 30, 9244–9261. [Google Scholar] [CrossRef] [PubMed]
- Guan, X.; Du, Y.; Ma, R.; Teng, N.; Ou, S.; Zhao, H.; Li, X. Construction of the XGBoost model for early lung cancer prediction based on metabolic indices. BMC Med. Inform. Decis. Mak. 2023, 23, 107. [Google Scholar] [CrossRef] [PubMed]
- Tan, Y.; Zhang, W.H.; Huang, Z.; Tan, Q.X.; Zhang, Y.M.; Wei, C.Y.; Feng, Z.B. AI models predicting breast cancer distant metastasis using LightGBM with clinical blood markers and ultrasound maximum diameter. Sci. Rep. 2024, 14, 15561. [Google Scholar] [CrossRef]
- Boehm, K.M.; Khosravi, P.; Vanguri, R.; Gao, J.; Shah, S.P. Harnessing multimodal data integration to advance precision oncology. Nat. Rev. Cancer 2022, 22, 114–126. [Google Scholar] [CrossRef]
- Gao, F.; Wang, W.; Tan, M.; Zhu, L.; Zhang, Y.; Fessler, E.; Vermeulen, L.; Wang, X. DeepCC: A novel deep learning-based framework for cancer molecular subtype classification. Oncogenesis 2019, 8, 44. [Google Scholar] [CrossRef]
- Guinney, J.; Dienstmann, R.; Wang, X.; de Reynies, A.; Schlicker, A.; Soneson, C.; Marisa, L.; Roepman, P.; Nyamundanda, G.; Angelino, P.; et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 2015, 21, 1350–1356. [Google Scholar] [CrossRef]
- Chia, S.K.; Bramwell, V.H.; Tu, D.; Shepherd, L.E.; Jiang, S.; Vickery, T.; Mardis, E.; Leung, S.; Ung, K.; Pritchard, K.I.; et al. A 50-gene intrinsic subtype classifier for prognosis and prediction of benefit from adjuvant tamoxifen. Clin. Cancer Res. 2012, 18, 4465–4472. [Google Scholar] [CrossRef]
- Krijgsman, O.; Roepman, P.; Zwart, W.; Carroll, J.S.; Tian, S.; de Snoo, F.A.; Bender, R.A.; Bernards, R.; Glas, A.M. A diagnostic gene profile for molecular subtyping of breast cancer associated with treatment response. Breast Cancer Res. Treat. 2012, 133, 37–47. [Google Scholar] [CrossRef] [PubMed]
- Dowsett, M.; Cuzick, J.; Wale, C.; Forbes, J.; Mallon, E.A.; Salter, J.; Quinn, E.; Dunbier, A.; Baum, M.; Buzdar, A.; et al. Prediction of risk of distant recurrence using the 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: A TransATAC study. J. Clin. Oncol. 2010, 28, 1829–1834. [Google Scholar] [CrossRef] [PubMed]
- Xiong, X.; Liu, Y.; Pu, D.; Yang, Z.; Bi, Z.; Tian, L.; Li, X. DeSide: A unified deep learning approach for cellular deconvolution of tumor microenvironment. Proc. Natl. Acad. Sci. USA 2024, 121, e2407096121. [Google Scholar] [CrossRef]
- Lai, Y.H.; Chen, W.N.; Hsu, T.C.; Lin, C.; Tsao, Y.; Wu, S. Overall survival prediction of non-small cell lung cancer by integrating microarray and clinical data with deep learning. Sci. Rep. 2020, 10, 4679. [Google Scholar] [CrossRef]
- Zhang, D.; Zou, L.; Zhou, X.; He, F. Integrating Feature Selection and Feature Extraction Methods with Deep Learning to Predict Clinical Outcome of Breast Cancer. IEEE Access 2018, 6, 28936–28944. [Google Scholar] [CrossRef]
- Vibert, J.; Pierron, G.; Benoist, C.; Gruel, N.; Guillemot, D.; Vincent-Salomon, A.; Le Tourneau, C.; Livartowski, A.; Mariani, O.; Baulande, S.; et al. Identification of Tissue of Origin and Guided Therapeutic Applications in Cancers of Unknown Primary Using Deep Learning and RNA Sequencing (TransCUPtomics). J. Mol. Diagn. 2021, 23, 1380–1392. [Google Scholar] [CrossRef]
- Elbashir, M.K.; Ezz, M.; Mohammed, M.; Saloum, S.S. Lightweight Convolutional Neural Network for Breast Cancer Classification Using RNA-Seq Gene Expression Data. IEEE Access 2019, 7, 185338–185348. [Google Scholar] [CrossRef]
- Jones, S.; Beyers, M.; Shukla, M.; Xia, F.; Brettin, T.; Stevens, R.; Weil, M.R.; Ranganathan Ganakammal, S. TULIP: An RNA-seq-based Primary Tumor Type Prediction Tool Using Convolutional Neural Networks. Cancer Inform. 2022, 21, 11769351221139491. [Google Scholar] [CrossRef]
- Mohamed, T.I.A.; Ezugwu, A.E.; Fonou-Dombeu, J.V.; Ikotun, A.M.; Mohammed, M. A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data. Sci. Rep. 2023, 13, 14644. [Google Scholar] [CrossRef]
- Yaqoob, A.; Verma, N.K.; Aziz, R.M.; Shah, M.A. RNA-Seq analysis for breast cancer detection: A study on paired tissue samples using hybrid optimization and deep learning techniques. J. Cancer Res. Clin. Oncol. 2024, 150, 455. [Google Scholar] [CrossRef] [PubMed]
- Mostavi, M.; Chiu, Y.C.; Chen, Y.; Huang, Y. CancerSiamese: One-shot learning for predicting primary and metastatic tumor types unseen during model training. BMC Bioinform. 2021, 22, 244. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Wang, C.-Y. From bulk, single-cell to spatial RNA sequencing. Int. J. Oral Sci. 2021, 13, 36. [Google Scholar] [CrossRef]
- Del Giudice, M.; Peirone, S.; Perrone, S.; Priante, F.; Varese, F.; Tirtei, E.; Fagioli, F.; Cereda, M. Artificial Intelligence in Bulk and Single-Cell RNA-Sequencing Data to Foster Precision Oncology. Int. J. Mol. Sci. 2021, 22, 4563. [Google Scholar] [CrossRef]
- Molla Desta, G.; Birhanu, A.G. Advancements in single-cell RNA sequencing and spatial transcriptomics: Transforming biomedical research. Acta Biochim. Pol. 2025, 72, 13922. [Google Scholar] [CrossRef]
- Huang, C.; Liu, Z.; Guo, Y.; Wang, W.; Yuan, Z.; Guan, Y.; Pan, D.; Hu, Z.; Sun, L.; Fu, Z.; et al. scCancerExplorer: A comprehensive database for interactively exploring single-cell multi-omics data of human pan-cancer. Nucleic Acids Res. 2025, 53, D1526–D1535. [Google Scholar] [CrossRef]
- Han, Y.; Wang, Y.; Dong, X.; Sun, D.; Liu, Z.; Yue, J.; Wang, H.; Li, T.; Wang, C. TISCH2: Expanded datasets and new tools for single-cell transcriptome analyses of the tumor microenvironment. Nucleic Acids Res. 2023, 51, D1425–D1431. [Google Scholar] [CrossRef]
- Li, S.; Hua, H.; Chen, S. Graph neural networks for single-cell omics data: A review of approaches and applications. Brief. Bioinform. 2025, 26, bbaf109. [Google Scholar] [CrossRef]
- Yates, J.; Van Allen, E.M. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell 2025, 43, 708–727. [Google Scholar] [CrossRef] [PubMed]
- Gogoshin, G.; Rodin, A.S. Graph Neural Networks in Cancer and Oncology Research: Emerging and Future Trends. Cancers 2023, 15, 5858. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Ma, A.; Chang, Y.; Gong, J.; Jiang, Y.; Qi, R.; Wang, C.; Fu, H.; Ma, Q.; Xu, D. scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nat. Commun. 2021, 12, 1882, Erratum in Nat. Commun. 2022, 13, 2554. https://doi.org/10.1038/s41467-022-30331-6. [Google Scholar] [CrossRef] [PubMed]
- Zhang, D.-H.; Liang, C.; Hu, S.-Y.; Huang, X.-Y.; Yu, L.; Meng, X.-L.; Guo, X.-J.; Zeng, H.-Y.; Chen, Z.; Zhang, L.; et al. Application of a single-cell-RNA-based biological-inspired graph neural network in diagnosis of primary liver tumors. J. Transl. Med. 2024, 22, 883. [Google Scholar] [CrossRef]
- Wang, Y.; Chen, X.; Zheng, Z.; Huang, L.; Xie, W.; Wang, F.; Zhang, Z.; Wong, K.-C. scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. iScience 2024, 27, 109352. [Google Scholar] [CrossRef]
- Hao, M.; Gong, J.; Zeng, X.; Liu, C.; Guo, Y.; Cheng, X.; Wang, T.; Ma, J.; Zhang, X.; Song, L. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 2024, 21, 1481–1491. [Google Scholar] [CrossRef]
- Zaitsev, A.; Chelushkin, M.; Dyikanov, D.; Cheremushkin, I.; Shpak, B.; Nomie, K.; Zyrin, V.; Nuzhdina, E.; Lozinsky, Y.; Zotova, A.; et al. Precise reconstruction of the TME using bulk RNA-seq and a machine learning algorithm trained on artificial transcriptomes. Cancer Cell 2022, 40, 879–894.e816. [Google Scholar] [CrossRef]
- Sinha, S.; Vegesna, R.; Mukherjee, S.; Kammula, A.V.; Dhruba, S.R.; Wu, W.; Kerr, D.L.; Nair, N.U.; Jones, M.G.; Yosef, N.; et al. PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors. Nat. Cancer 2024, 5, 938–952. [Google Scholar] [CrossRef] [PubMed]
- Sartori, F.; Codicè, F.; Caranzano, I.; Rollo, C.; Birolo, G.; Fariselli, P.; Pancotti, C. A Comprehensive Review of Deep Learning Applications with Multi-Omics Data in Cancer Research. Genes 2025, 16, 648. [Google Scholar] [CrossRef]
- Chakraborty, S.; Sharma, G.; Karmakar, S.; Banerjee, S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim. Biophys. Acta (BBA)—Mol. Basis Dis. 2024, 1870, 167120. [Google Scholar] [CrossRef]
- Liu, X.; Tao, Y.; Cai, Z.; Bao, P.; Ma, H.; Li, K.; Li, M.; Zhu, Y.; Lu, Z.J.; Wren, J. Pathformer: A biological pathway informed transformer for disease diagnosis and prognosis using multi-omics data. Bioinformatics 2024, 40, btae316. [Google Scholar] [CrossRef]
- Oh, J.H.; Choi, W.; Ko, E.; Kang, M.; Tannenbaum, A.; Deasy, J.O. PathCNN: Interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma. Bioinformatics 2021, 37, i443–i450. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.; Shao, W.; Huang, Z.; Tang, H.; Zhang, J.; Ding, Z.; Huang, K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 2021, 12, 3445. [Google Scholar] [CrossRef] [PubMed]
- Wolfram-Schauerte, M.; Vogel, T.; Tuoken, H.; Fälth Savitski, M.; Simon, E.; Nieselt, K. Approaching the holistic transcriptome—Convolution and deconvolution in transcriptomics. Brief. Bioinform. 2025, 26, bbaf388. [Google Scholar] [CrossRef]
- Newman, A.M.; Steen, C.B.; Liu, C.L.; Gentles, A.J.; Chaudhuri, A.A.; Scherer, F.; Khodadoust, M.S.; Esfahani, M.S.; Luca, B.A.; Steiner, D.; et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 2019, 37, 773–782. [Google Scholar] [CrossRef]
- Wang, C.; Lin, Y.; Li, S.; Guan, J. Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-seq data. BMC Genom. 2024, 25, 875. [Google Scholar] [CrossRef]
- Chu, T.; Wang, Z.; Pe’er, D.; Danko, C.G. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat. Cancer 2022, 3, 505–517. [Google Scholar] [CrossRef]
- Li, Z.; Wu, H. TOAST: Improving reference-free cell composition estimation by cross-cell type differential analysis. Genome Biol. 2019, 20, 190. [Google Scholar] [CrossRef]
- Wang, L.; Sebra, R.P.; Sfakianos, J.P.; Allette, K.; Wang, W.; Yoo, S.; Bhardwaj, N.; Schadt, E.E.; Yao, X.; Galsky, M.D.; et al. A reference profile-free deconvolution method to infer cancer cell-intrinsic subtypes and tumor-type-specific stromal profiles. Genome Med. 2020, 12, 24. [Google Scholar] [CrossRef]
- Riley, R.D.; Collins, G.S. Stability of clinical prediction models developed using statistical or machine learning methods. Biom. J. 2023, 65, e2200302. [Google Scholar] [CrossRef]
- Martin, G.P.; Riley, R.D.; Ensor, J.; Grant, S.W. Statistical primer: Sample size considerations for developing and validating clinical prediction models. Eur. J. Cardiothorac. Surg. 2025, 67, ezaf142. [Google Scholar] [CrossRef] [PubMed]
- Gross, B.; Dauvin, A.; Cabeli, V.; Kmetzsch, V.; El Khoury, J.; Dissez, G.; Ouardini, K.; Grouard, S.; Davi, A.; Loeb, R.; et al. Robust evaluation of deep learning-based representation methods for survival and gene essentiality prediction on bulk RNA-seq data. Sci. Rep. 2024, 14, 17064. [Google Scholar] [CrossRef] [PubMed]
- Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2021, 2, 420. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Miller, C.; Portlock, T.; Nyaga, D.M.; O’Sullivan, J.M. A review of model evaluation metrics for machine learning in genetics and genomics. Front. Bioinform. 2024, 4, 1457619. [Google Scholar] [CrossRef]
- Savvides, R.; Mäkelä, J.; Puolamäki, K. Model selection with bootstrap validation. Stat. Anal. Data Min. ASA Data Sci. J. 2023, 16, 162–186. [Google Scholar] [CrossRef]
- Huang, A.A.; Huang, S.Y. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS ONE 2023, 18, e0281922. [Google Scholar] [CrossRef]
- Al Seesi, S.; Tiagueu, Y.T.; Zelikovsky, A.; Mandoiu, I.I. Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates. BMC Genom. 2014, 15, S2. [Google Scholar] [CrossRef]
- Stupnikov, A.; McInerney, C.E.; Savage, K.I.; McIntosh, S.A.; Emmert-Streib, F.; Kennedy, R.; Salto-Tellez, M.; Prise, K.M.; McArt, D.G. Robustness of differential gene expression analysis of RNA-seq. Comput. Struct. Biotechnol. J. 2021, 19, 3470–3481. [Google Scholar] [CrossRef] [PubMed]
- Conesa, A.; Madrigal, P.; Tarazona, S.; Gomez-Cabrero, D.; Cervera, A.; McPherson, A.; Szczesniak, M.W.; Gaffney, D.J.; Elo, L.L.; Zhang, X.; et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016, 17, 13, Erratum in Genome Biol. 2016, 17, 181. https://doi.org/10.1186/s13059-016-1047-4. [Google Scholar] [CrossRef]
- Van, R.; Alvarez, D.; Mize, T.; Gannavarapu, S.; Chintham Reddy, L.; Nasoz, F.; Han, M.V. A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies. BMC Bioinform. 2024, 25, 181. [Google Scholar] [CrossRef] [PubMed]
- Cai, Z.; Poulos, R.C.; Liu, J.; Zhong, Q. Machine learning for multi-omics data integration in cancer. iScience 2022, 25, 103798. [Google Scholar] [CrossRef]
- Younis, H.; Minghim, R. Enhancing Cancer Classification from RNA Sequencing Data Using Deep Learning and Explainable AI. Mach. Learn. Knowl. Extr. 2025, 7, 114. [Google Scholar] [CrossRef]
- Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
- Nilsson, A.; Meimetis, N.; Lauffenburger, D.A. Towards an interpretable deep learning model of cancer. npj Precis. Oncol. 2025, 9, 46. [Google Scholar] [CrossRef] [PubMed]
- Watson, D.S. Interpretable machine learning for genomics. Hum. Genet. 2022, 141, 1499–1513. [Google Scholar] [CrossRef]



Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pudova, E.A.; Pavlov, V.S.; Guvatova, Z.G.; Fedorova, M.S.; Shegai, P.V.; Kudryavtseva, A.V.; Snezhkina, A.V. Machine Learning Models for Cancer Research: A Narrative Review of Bulk RNA-Seq Applications. Int. J. Mol. Sci. 2025, 26, 12081. https://doi.org/10.3390/ijms262412081
Pudova EA, Pavlov VS, Guvatova ZG, Fedorova MS, Shegai PV, Kudryavtseva AV, Snezhkina AV. Machine Learning Models for Cancer Research: A Narrative Review of Bulk RNA-Seq Applications. International Journal of Molecular Sciences. 2025; 26(24):12081. https://doi.org/10.3390/ijms262412081
Chicago/Turabian StylePudova, Elena A., Vladislav S. Pavlov, Zulfiya G. Guvatova, Maria S. Fedorova, Petr V. Shegai, Anna V. Kudryavtseva, and Anastasiya V. Snezhkina. 2025. "Machine Learning Models for Cancer Research: A Narrative Review of Bulk RNA-Seq Applications" International Journal of Molecular Sciences 26, no. 24: 12081. https://doi.org/10.3390/ijms262412081
APA StylePudova, E. A., Pavlov, V. S., Guvatova, Z. G., Fedorova, M. S., Shegai, P. V., Kudryavtseva, A. V., & Snezhkina, A. V. (2025). Machine Learning Models for Cancer Research: A Narrative Review of Bulk RNA-Seq Applications. International Journal of Molecular Sciences, 26(24), 12081. https://doi.org/10.3390/ijms262412081

