Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (38)

Search Parameters:
Keywords = machine learning ovarian cancer prediction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 959 KiB  
Review
Machine Learning-Driven Insights in Cancer Metabolomics: From Subtyping to Biomarker Discovery and Prognostic Modeling
by Amr Elguoshy, Hend Zedan and Suguru Saito
Metabolites 2025, 15(8), 514; https://doi.org/10.3390/metabo15080514 - 1 Aug 2025
Viewed by 256
Abstract
Cancer metabolic reprogramming plays a critical role in tumor progression and therapeutic resistance, underscoring the need for advanced analytical strategies. Metabolomics, leveraging mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy, offers a comprehensive and functional readout of tumor biochemistry. By enabling both targeted [...] Read more.
Cancer metabolic reprogramming plays a critical role in tumor progression and therapeutic resistance, underscoring the need for advanced analytical strategies. Metabolomics, leveraging mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy, offers a comprehensive and functional readout of tumor biochemistry. By enabling both targeted metabolite quantification and untargeted profiling, metabolomics captures the dynamic metabolic alterations associated with cancer. The integration of metabolomics with machine learning (ML) approaches further enhances the interpretation of these complex, high-dimensional datasets, providing powerful insights into cancer biology from biomarker discovery to therapeutic targeting. This review systematically examines the transformative role of ML in cancer metabolomics. We discuss how various ML methodologies—including supervised algorithms (e.g., Support Vector Machine, Random Forest), unsupervised techniques (e.g., Principal Component Analysis, t-SNE), and deep learning frameworks—are advancing cancer research. Specifically, we highlight three major applications of ML–metabolomics integration: (1) cancer subtyping, exemplified by the use of Similarity Network Fusion (SNF) and LASSO regression to classify triple-negative breast cancer into subtypes with distinct survival outcomes; (2) biomarker discovery, where Random Forest and Partial Least Squares Discriminant Analysis (PLS-DA) models have achieved >90% accuracy in detecting breast and colorectal cancers through biofluid metabolomics; and (3) prognostic modeling, demonstrated by the identification of race-specific metabolic signatures in breast cancer and the prediction of clinical outcomes in lung and ovarian cancers. Beyond these areas, we explore applications across prostate, thyroid, and pancreatic cancers, where ML-driven metabolomics is contributing to earlier detection, improved risk stratification, and personalized treatment planning. We also address critical challenges, including issues of data quality (e.g., batch effects, missing values), model interpretability, and barriers to clinical translation. Emerging solutions, such as explainable artificial intelligence (XAI) approaches and standardized multi-omics integration pipelines, are discussed as pathways to overcome these hurdles. By synthesizing recent advances, this review illustrates how ML-enhanced metabolomics bridges the gap between fundamental cancer metabolism research and clinical application, offering new avenues for precision oncology through improved diagnosis, prognosis, and tailored therapeutic strategies. Full article
(This article belongs to the Special Issue Nutritional Metabolomics in Cancer)
Show Figures

Figure 1

39 pages, 2187 KiB  
Review
Interleukin-6 Is a Crucial Factor in Shaping the Inflammatory Tumor Microenvironment in Ovarian Cancer and Determining Its Hot or Cold Nature with Diagnostic and Prognostic Utilities
by Hina Amer, Katie L. Flanagan, Nirmala C. Kampan, Catherine Itsiopoulos, Clare L. Scott, Apriliana E. R. Kartikasari and Magdalena Plebanski
Cancers 2025, 17(10), 1691; https://doi.org/10.3390/cancers17101691 - 17 May 2025
Cited by 1 | Viewed by 1666
Abstract
Ovarian cancer (OC) remains the leading cause of cancer-related deaths among women, often diagnosed at advanced stages due to the lack of effective early diagnostic procedures. To reduce the high mortality rates in OC, reliable biomarkers are urgently needed, especially to detect OC [...] Read more.
Ovarian cancer (OC) remains the leading cause of cancer-related deaths among women, often diagnosed at advanced stages due to the lack of effective early diagnostic procedures. To reduce the high mortality rates in OC, reliable biomarkers are urgently needed, especially to detect OC at its earliest stage, predict specific drug responses, and monitor patients. The cytokine interleukin-6 (IL6) is associated with low survival rates, treatment resistance, and recurrence. In this review, we summarize the role of IL6 in inflammation and how IL6 contributes to ovarian tumorigenesis within the tumor microenvironment, influencing whether the tumor is subsequently classified as “hot” or “cold”. We further dissect the molecular and cellular mechanisms through which IL6 production and downstream signaling are regulated, to enhance our understanding of its involvement in OC development, as well as OC resistance to treatment. We highlight the potential of IL6 to be used as a reliable diagnostic biomarker to help detect OC at its earliest stage, and as a part of predictive and prognostic signatures to improve OC management. We further discuss ways to leverage artificial intelligence and machine learning to integrate IL6 into diverse biomarker-based strategies. Full article
Show Figures

Figure 1

16 pages, 579 KiB  
Systematic Review
New Evidence About Malignant Transformation of Endometriosis—A Systematic Review
by Alexandra Ioannidou, Maria Sakellariou, Vaia Sarli, Periklis Panagopoulos and Nikolaos Machairiotis
J. Clin. Med. 2025, 14(9), 2975; https://doi.org/10.3390/jcm14092975 - 25 Apr 2025
Viewed by 1962
Abstract
Background: Endometriosis is a benign gynecologic condition that has the risk of malignant transformation in approximately 0.5–1% of cases, most of which develop into endometriosis-associated ovarian cancers (EAOCs), such as clear cell and endometrioid adenocarcinomas. The current systematic review aims to condense recent [...] Read more.
Background: Endometriosis is a benign gynecologic condition that has the risk of malignant transformation in approximately 0.5–1% of cases, most of which develop into endometriosis-associated ovarian cancers (EAOCs), such as clear cell and endometrioid adenocarcinomas. The current systematic review aims to condense recent information on the genetic and molecular mechanisms, clinical risk factors, and possible therapeutic targets of the malignant transformation of endometriosis. Methods: A systematic literature search of PubMed, Europe PMC, and Google Scholar was carried out according to PRISMA guidelines for articles published until December 2024. Following a screening of 44,629 titles, 43 full articles were included after meeting inclusion criteria. No case reports or reviews were included, and articles had to mention a malignant transformation of endometriosis and not just a diagnosis of cancer. The quality and risk of bias of studies were evaluated using ROBINS-I. Results: Malignant transformation of endometriosis is associated with genetic alterations, including ARID1A mutations, microsatellite instability, and abnormal PI3K/Akt and mTOR pathway activation. Increased oxidative stress, inflammation-driven mismatch repair deficiency, and epigenetic alterations like RUNX3 and RASSF2 hypermethylation are implicated in carcinogenesis. Clinical risk factors are advanced age (40–60 years), large ovarian endometriomas (>9 cm), postmenopausal status, and prolonged estrogen exposure. Imaging techniques like MR relaxometry and risk models based on machine learning are highly predictive for early detection. Conclusions: Endometriosis carcinogenesis is a multifactorial process driven by genetic changes, oxidative stress, and inflammatory mechanisms. Identification of high-risk individuals through molecular and imaging biomarkers may result in early detection and personalized therapy. Further research should aim at the development of more precise predictive models and exploration of precision medicine approaches to inhibit the emergence of EAOC. Full article
Show Figures

Figure 1

18 pages, 2558 KiB  
Systematic Review
Artificial Intelligence in Ovarian Cancer: A Systematic Review and Meta-Analysis of Predictive AI Models in Genomics, Radiomics, and Immunotherapy
by Mauro Francesco Pio Maiorano, Gennaro Cormio, Vera Loizzi and Brigida Anna Maiorano
AI 2025, 6(4), 84; https://doi.org/10.3390/ai6040084 - 18 Apr 2025
Viewed by 2612
Abstract
Background/Objectives: Artificial intelligence (AI) is increasingly influencing oncological research by enabling precision medicine in ovarian cancer through enhanced prediction of therapy response and patient stratification. This systematic review and meta-analysis was conducted to assess the performance of AI-driven models across three key [...] Read more.
Background/Objectives: Artificial intelligence (AI) is increasingly influencing oncological research by enabling precision medicine in ovarian cancer through enhanced prediction of therapy response and patient stratification. This systematic review and meta-analysis was conducted to assess the performance of AI-driven models across three key domains: genomics and molecular profiling, radiomics-based imaging analysis, and prediction of immunotherapy response. Methods: Relevant studies were identified through a systematic search across multiple databases (2020–2025), adhering to PRISMA guidelines. Results: Thirteen studies met the inclusion criteria, involving over 10,000 ovarian cancer patients and encompassing diverse AI models such as machine learning classifiers and deep learning architectures. Pooled AUCs indicated strong predictive performance for genomics-based (0.78), radiomics-based (0.88), and immunotherapy-based (0.77) models. Notably, radiogenomics-based AI integrating imaging and molecular data yielded the highest accuracy (AUC = 0.975), highlighting the potential of multi-modal approaches. Heterogeneity and risk of bias were assessed, and evidence certainty was graded. Conclusions: Overall, AI demonstrated promise in predicting therapeutic outcomes in ovarian cancer, with radiomics and integrated radiogenomics emerging as leading strategies. Future efforts should prioritize explainability, prospective multi-center validation, and integration of immune and spatial transcriptomic data to support clinical implementation and individualized treatment strategies. Unlike earlier reviews, this study synthesizes a broader range of AI applications in ovarian cancer and provides pooled performance metrics across diverse models. It examines the methodological soundness of the selected studies and highlights current gaps and opportunities for clinical translation, offering a comprehensive and forward-looking perspective in the field. Full article
Show Figures

Figure 1

13 pages, 476 KiB  
Article
Prediction of Clavien Dindo Classification ≥ Grade III Complications After Epithelial Ovarian Cancer Surgery Using Machine Learning Methods
by Aysun Alci, Fatih Ikiz, Necim Yalcin, Mustafa Gokkaya, Gulsum Ekin Sari, Isin Ureyen and Tayfun Toptas
Medicina 2025, 61(4), 695; https://doi.org/10.3390/medicina61040695 - 10 Apr 2025
Viewed by 656
Abstract
Background and Objectives: Ovarian cancer surgery requires multiple radical resections with a high risk of complications. The aim of this single-centre, retrospective study was to determine the best method for predicting Clavien–Dindo grade ≥ III complications using machine learning techniques. Material and Methods [...] Read more.
Background and Objectives: Ovarian cancer surgery requires multiple radical resections with a high risk of complications. The aim of this single-centre, retrospective study was to determine the best method for predicting Clavien–Dindo grade ≥ III complications using machine learning techniques. Material and Methods: The study included 179 patients who underwent surgery at the gynaecological oncology department of Antalya Training and Research Hospital between January 2015 and December 2020. The data were randomly split into training set n = 134 (75%) and test set n = 45 (25%). We used 49 predictors to develop the best algorithm. Mean absolute error, root mean squared error, correlation coefficients, Mathew’s correlation coefficient, and F1 score were used to determine the best performing algorithm. Cohens’ kappa value was evaluated to analyse the consistency of the model with real data. The relationship between these predicted values and the actual values were then summarised using a confusion matrix. True positive (TP) rate, False positive (FP) rate, precision, recall, and Area under the curve (AUC) values were evaluated to demonstrate clinical usability and classification skills. Results: 139 patients (77.65%) had no morbidity or grade I-II CDC morbidity, while 40 patients (22.35%) had grade III or higher CDC morbidity. BayesNet was found to be the most effective prediction model. No dominant parameter was observed in the Bayesian net importance matrix plot. The true positive (TP) rate was 76%, false positive (FP) rate was 15.6%, recall rate (sensitivity) was 76.9%, and overall accuracy was 82.2% A receiver operating characteristic (ROC) analysis was performed to estimate CDC grade ≥ III. AUC was 0.863 with a statistical significance of p < 0.001, indicating a high degree of accuracy. Conclusions: The Bayesian network model achieved the highest accuracy compared to all other models in predicting CDC Grade ≥ III complications following epithelial ovarian cancer surgery. Full article
(This article belongs to the Section Obstetrics and Gynecology)
Show Figures

Figure 1

15 pages, 1859 KiB  
Article
Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature Selection
by Tuğçe Öznacar and Tunç Güler
Life 2025, 15(4), 594; https://doi.org/10.3390/life15040594 - 3 Apr 2025
Cited by 3 | Viewed by 950
Abstract
Objectives: Ovarian cancer continues to be one of the most prevalent gynecological cancers diagnosed. Early detection is highly critical for increasing survival chances. This research aims to assess the feature extraction process from various machine learning techniques for better modelling of ovarian cancer [...] Read more.
Objectives: Ovarian cancer continues to be one of the most prevalent gynecological cancers diagnosed. Early detection is highly critical for increasing survival chances. This research aims to assess the feature extraction process from various machine learning techniques for better modelling of ovarian cancer and the selection process in ovarian cancer analysis. By eliminating irrelevant features, this approach could guide clinicians towards more accurate results and optimize diagnostic precision. Methods: This study included both patients with and without ovarian cancer, creating a dataset containing 50 independent variables/features. Eight machine learning algorithms: Random Forest, XGBoost, CatBoost, Decision Tree, K-Nearest Neighbors, Naive Bayes, Gradient Boosting, and Support Vector Machine, were evaluated alongside four feature selection techniques: Boruta, PCA, RFE, and MI. Metrics performance has been evaluated to obtain the best possible combination for diagnosis. Results: These results were obtained using these methods with a significantly reduced number of features. Random Forest and CatBoost’s performances demonstrated significant differences in contrast to other algorithms (respectively, AUC 0.94% and 0.95%). On the other hand, feature selection methods such as Boruta and RFE consistently reflected higher AUC and accuracy scores than the others. Conclusions: This study highlights the importance of choosing appropriate machine learning algorithms and feature selection techniques for ovarian cancer diagnosis. Boruta and RFE showed high accuracy. By reducing the number of features from 50 to the most relevant ones, clinicians can make more precise diagnoses, enhance patient outcomes, and reduce unnecessary tests. Full article
(This article belongs to the Section Medical Research)
Show Figures

Figure 1

20 pages, 2054 KiB  
Article
Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model
by Deborah Jael Herrera, Daiane Maria Seibert, Karen Feyen, Marlon van Loo, Guido Van Hal and Wessel van de Veerdonk
Gastrointest. Disord. 2025, 7(2), 26; https://doi.org/10.3390/gidisord7020026 - 24 Mar 2025
Cited by 1 | Viewed by 1785
Abstract
Background: Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. While screening tools such as the fecal immunochemical test (FIT) aid in early detection, they do not provide insights into individual risk factors or strategies for primary prevention. This study aimed [...] Read more.
Background: Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. While screening tools such as the fecal immunochemical test (FIT) aid in early detection, they do not provide insights into individual risk factors or strategies for primary prevention. This study aimed to develop and internally validate an interpretable machine learning-based model that estimates an individual’s probability of developing CRC using readily available clinical and lifestyle factors. Methods: We analyzed data from 154,887 adults, aged 55–74 years, who participated in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. A risk prediction model was built using the Light Gradient Boosting Machine (LightGBM) algorithm. To translate these findings into clinical practice, we implemented the model into a risk estimator that categorizes individuals as average, increased, or high risk, highlighting modifiable risk factors to support patient–clinician discussions on lifestyle changes. Results: The LightGBM model incorporated 12 predictive variables, with age, weight, and smoking history identified as the strongest CRC risk factors, while heart medication use appeared to have a potentially protective effect. The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.726 (95% confidence interval [CI]: 0.698–0.753), correctly distinguishing high-risk from average-risk individuals 73 out of 100 times. Conclusions: Our findings suggest that this model could support clinicians and individuals considering screening by guiding informed decision making and facilitating patient–clinician discussions on CRC prevention through personalized lifestyle modifications. However, before clinical implementation, external validation is needed to ensure its reliability across diverse populations and confirm its effectiveness in real-world healthcare settings. Full article
Show Figures

Figure 1

18 pages, 3720 KiB  
Article
Double-Weighted Bayesian Model Combination for Metabolomics Data Description and Prediction
by Jacopo Troisi, Martina Lombardi, Alessio Trotta, Vera Abenante, Andrea Ingenito, Nicole Palmieri, Sean M. Richards, Steven J. K. Symes and Pierpaolo Cavallo
Metabolites 2025, 15(4), 214; https://doi.org/10.3390/metabo15040214 - 21 Mar 2025
Viewed by 663
Abstract
Background/Objectives: This study presents a novel double-weighted Bayesian Ensemble Machine Learning (DW-EML) model aimed at improving the classification and prediction of metabolomics data. This discipline, which involves the comprehensive analysis of metabolites in a biological system, provides valuable insights into complex biological processes [...] Read more.
Background/Objectives: This study presents a novel double-weighted Bayesian Ensemble Machine Learning (DW-EML) model aimed at improving the classification and prediction of metabolomics data. This discipline, which involves the comprehensive analysis of metabolites in a biological system, provides valuable insights into complex biological processes and disease states. As metabolomics assumes an increasingly prominent role in the diagnosis of human diseases and in precision medicine, there is a pressing need for more robust artificial intelligence tools that can offer enhanced reliability and accuracy in medical applications. The proposed DW-EML model addresses this by integrating multiple classifiers within a double-weighted voting scheme, which assigns weights based on the cross-validation accuracy and classification confidence, ensuring a more reliable prediction framework. Methods: The model was applied to publicly available datasets derived from studies on critical illness in children, chronic typhoid carriage, and early detection of ovarian cancer. Results: The results demonstrate that the DW-EML approach outperformed methods traditionally used in metabolomics, such as the Partial Least Squares Discriminant Analysis in terms of accuracy and predictive power. Conclusions: The DW-EML model is a promising tool for metabolomic data analysis, offering enhanced robustness and reliability for diagnostic and prognostic applications and potentially contributing to the advancement of personalized and precision medicine. Full article
(This article belongs to the Section Bioinformatics and Data Analysis)
Show Figures

Figure 1

19 pages, 4910 KiB  
Article
A Novel SHAP-GAN Network for Interpretable Ovarian Cancer Diagnosis
by Jingxun Cai, Zne-Jung Lee, Zhihxian Lin and Ming-Ren Yang
Mathematics 2025, 13(5), 882; https://doi.org/10.3390/math13050882 - 6 Mar 2025
Viewed by 939
Abstract
Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existing diagnostic methods, such as biomarker testing and imaging, [...] Read more.
Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existing diagnostic methods, such as biomarker testing and imaging, can help with early diagnosis to some extent, these methods still have limitations in sensitivity and accuracy, often leading to misdiagnosis or missed diagnosis. Ovarian cancer’s high heterogeneity and complexity increase diagnostic challenges, especially in disease progression prediction and patient classification. Machine learning (ML) has outperformed traditional methods in cancer detection by processing large datasets to identify patterns missed by conventional techniques. However, existing AI models still struggle with accuracy in handling imbalanced and high-dimensional data, and their “black-box” nature limits clinical interpretability. To address these issues, this study proposes SHAP-GAN, an innovative diagnostic model for ovarian cancer that integrates Shapley Additive exPlanations (SHAP) with Generative Adversarial Networks (GANs). The SHAP module quantifies each biomarker’s contribution to the diagnosis, while the GAN component optimizes medical data generation. This approach tackles three key challenges in medical diagnosis: data scarcity, model interpretability, and diagnostic accuracy. Results show that SHAP-GAN outperforms traditional methods in sensitivity, accuracy, and interpretability, particularly with high-dimensional and imbalanced ovarian cancer datasets. The top three influential features identified are PRR11, CIAO1, and SMPD3, which exhibit wide SHAP value distributions, highlighting their significant impact on model predictions. The SHAP-GAN network has demonstrated an impressive accuracy rate of 99.34% on the ovarian cancer dataset, significantly outperforming baseline algorithms, including Support Vector Machines (SVM), Logistic Regression (LR), and XGBoost. Specifically, SVM achieved an accuracy of 72.78%, LR achieved 86.09%, and XGBoost achieved 96.69%. These results highlight the superior performance of SHAP-GAN in handling high-dimensional and imbalanced datasets. Furthermore, SHAP-GAN significantly alleviates the challenges associated with intricate genetic data analysis, empowering medical professionals to tailor personalized treatment strategies for individual patients. Full article
Show Figures

Figure 1

14 pages, 1325 KiB  
Systematic Review
Predicting Response to Treatment and Survival in Advanced Ovarian Cancer Using Machine Learning and Radiomics: A Systematic Review
by Sabrina Piedimonte, Mariam Mohamed, Gabriela Rosa, Brigit Gerstl and Danielle Vicus
Cancers 2025, 17(3), 336; https://doi.org/10.3390/cancers17030336 - 21 Jan 2025
Cited by 1 | Viewed by 1938
Abstract
Background and Objective: Machine learning and radiomics (ML/RM) are gaining interest in ovarian cancer (OC) but only a few studies have used these methods to predict treatment response. The objective of this study was to review the literature on the applications of ML/RM [...] Read more.
Background and Objective: Machine learning and radiomics (ML/RM) are gaining interest in ovarian cancer (OC) but only a few studies have used these methods to predict treatment response. The objective of this study was to review the literature on the applications of ML/RM in OC assessments, specifically focusing on studies describing algorithms to predict treatment response and survival. Methods: This is a systematic review of the published literature from January 1985 to December 2023 on the use of ML/RM in OC An extensive search of electronic library databases was conducted. Two independent reviewers screened the articles initially by title then by full text. Quality was assessed using the MINORS criteria. p-values were generated using the Pearson’s Chi-squared (x2) test to compare the performances of ML/RM models with traditional statistics. Results: Of the 5576 screened articles, 225 studies were included. Between 2021 and 2023, 49 studies were published, highlighting the rapidly growing interest in ML/RM. Median-quality scores using the MINORS scale were similar between studies published between 1985–2021 and 2021–2023 (both 8). Neural Networks (22.6%) and LASSO (15.3%) were the most common ML/RM algorithms in OC. Among these studies, 13 focused specifically on prediction of treatment response using radiomics. A total of 5113 patients were analyzed. The most common algorithms were Random Forest (4/13) followed by Neural Networks (3/13) and Support Vectors (3/13). Radiomic analysis was used to predict response to neoadjuvant chemotherapy in seven studies, with a median AUC of 0.77 (range 0.72–0.93), while the median AUC was 0.82 (range 0.77–0.89) in the six studies assessing the prediction of optimal or complete cytoreduction. Median model accuracy reported in 7/13 studies was 73% (range 66–98%). Additionally, four studies investigated the use of ML/RM for survival prediction for OC. The XGBoost model had 80.9% accuracy in predicting 5-year survival compared to linear regression, which achieved 79% accuracy. The Random Forest model has 93.7% accuracy in predicting 12-month progression-free survival, compared to 82% for linear regression. Conclusions: In conclusion, we found that the use of ML/RM algorithms is becoming a more frequent method to predict responses to treatment of OC. These models should be validated in a prospective multicenter trial prior to integration into clinical use. Full article
Show Figures

Figure 1

21 pages, 473 KiB  
Article
Feature Selection in Cancer Classification: Utilizing Explainable Artificial Intelligence to Uncover Influential Genes in Machine Learning Models
by Matheus Dalmolin, Karolayne S. Azevedo, Luísa C. de Souza, Caroline B. de Farias, Martina Lichtenfels and Marcelo A. C. Fernandes
AI 2025, 6(1), 2; https://doi.org/10.3390/ai6010002 - 27 Dec 2024
Cited by 1 | Viewed by 2515
Abstract
This study investigates the use of machine learning (ML) models combined with explainable artificial intelligence (XAI) techniques to identify the most influential genes in the classification of five recurrent cancer types in women: breast cancer (BRCA), lung adenocarcinoma (LUAD), thyroid cancer (THCA), ovarian [...] Read more.
This study investigates the use of machine learning (ML) models combined with explainable artificial intelligence (XAI) techniques to identify the most influential genes in the classification of five recurrent cancer types in women: breast cancer (BRCA), lung adenocarcinoma (LUAD), thyroid cancer (THCA), ovarian cancer (OV), and colon adenocarcinoma (COAD). Gene expression data from RNA-seq, extracted from The Cancer Genome Atlas (TCGA), were used to train ML models, including decision trees (DTs), random forest (RF), and XGBoost (XGB), which achieved accuracies of 98.69%, 99.82%, and 99.37%, respectively. However, the challenges in this analysis included the high dimensionality of the dataset and the lack of transparency in the ML models. To mitigate these challenges, the SHAP (Shapley Additive Explanations) method was applied to generate a list of features, aiming to understand which characteristics influenced the models’ decision-making processes and, consequently, the prediction results for the five tumor types. The SHAP analysis identified 119, 80, and 10 genes for the RF, XGB, and DT models, respectively, totaling 209 genes, resulting in 172 unique genes. The new list, representing 0.8% of the original input features, is coherent and fully explainable, increasing confidence in the applied models. Additionally, the results suggest that the SHAP method can be effectively used as a feature selector in gene expression data. This approach not only enhances model transparency but also maintains high classification performance, highlighting its potential in identifying biologically relevant features that may serve as biomarkers for cancer diagnostics and treatment planning. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

18 pages, 4029 KiB  
Article
An Integrated Algorithm with Feature Selection, Data Augmentation, and XGBoost for Ovarian Cancer
by Jingxun Cai, Zne-Jung Lee, Zhihxian Lin, Chih-Hung Hsu and Yun Lin
Mathematics 2024, 12(24), 4041; https://doi.org/10.3390/math12244041 - 23 Dec 2024
Cited by 1 | Viewed by 1109
Abstract
Ovarian cancer is one of the most aggressive gynecological cancers due to its high invasion and chemoresistance. It not only has a high incidence rate but also tops the list of mortality rates. Its subtle early symptoms make subsequent diagnosis difficult, significantly delaying [...] Read more.
Ovarian cancer is one of the most aggressive gynecological cancers due to its high invasion and chemoresistance. It not only has a high incidence rate but also tops the list of mortality rates. Its subtle early symptoms make subsequent diagnosis difficult, significantly delaying timely treatment for patients. Once ovarian cancer reaches an advanced stage, the complexity and difficulty of treatment increase substantially, affecting patient survival rates. Therefore, it is crucial for both medical professionals and patients to remain highly vigilant about the early signs of ovarian cancer to ensure timely intervention. In recent years, ovarian cancer prediction research has advanced, allowing for the analysis of the likelihood and type of cancer based on patients’ genetic data. With the rapid development of machine learning, numerous efficient classification prediction models have emerged. These new technologies offer significant opportunities and potential for developing ovarian cancer diagnostic prediction methods. However, traditional approaches often struggle to achieve satisfactory classification accuracy in high-dimensional genetic datasets with small sample sizes. This research offers a prediction model utilizing genomic data to enhance the early diagnosis rate of ovarian cancer, incorporating feature selection, data augmentation through adversarial conditional generative adversarial networks (AC-GAN), and an extreme gradient boosting (XGBoost) classifier. First, we can simplify the original genetic dataset through feature selection methods, removing irrelevant variables and noise, thereby improving the model’s predictive accuracy. Following dimensionality reduction, AC-GAN enriches the data, producing more realistic genetic samples to enhance the model’s generalization capacity. Finally, the XGBoost classifier is applied to classify the augmented data, achieving efficient predictions for ovarian cancer. These research findings strongly demonstrate that the diagnostic method proposed in this paper has a significant advantage in the predictive diagnosis of ovarian cancer, with an accuracy of 99.01% that surpasses the current technologies in use. Additionally, the algorithm identifies twelve genes highly relevant to ovarian cancer, providing valuable insights for physicians during diagnosis. Full article
Show Figures

Figure 1

26 pages, 4858 KiB  
Article
In Silico Design of Peptide Inhibitors Targeting HER2 for Lung Cancer Therapy
by Heba Ahmed Alkhatabi and Hisham N. Alatyb
Cancers 2024, 16(23), 3979; https://doi.org/10.3390/cancers16233979 - 27 Nov 2024
Cited by 1 | Viewed by 2043
Abstract
Background/Objectives: Human epidermal growth factor receptor 2 (HER2) is overexpressed in several malignancies, such as breast, gastric, ovarian, and lung cancers, where it promotes aggressive tumor proliferation and unfavorable prognosis. Targeting HER2 has thus emerged as a crucial therapeutic strategy, particularly for HER2-positive [...] Read more.
Background/Objectives: Human epidermal growth factor receptor 2 (HER2) is overexpressed in several malignancies, such as breast, gastric, ovarian, and lung cancers, where it promotes aggressive tumor proliferation and unfavorable prognosis. Targeting HER2 has thus emerged as a crucial therapeutic strategy, particularly for HER2-positive malignancies. The present study focusses on the design and optimization of peptide inhibitors targeting HER2, utilizing machine learning to identify and enhance peptide candidates with elevated binding affinities. The aim is to provide novel therapeutic options for malignancies linked to HER2 overexpression. Methods: This study started with the extraction and structural examination of the HER2 protein, succeeded by designing the peptide sequences derived from essential interaction residues. A machine learning technique (XGBRegressor model) was employed to predict binding affinities, identifying the top 20 peptide possibilities. The candidates underwent further screening via the FreeSASA methodology and binding free energy calculations, resulting in the selection of four primary candidates (pep-17, pep-7, pep-2, and pep-15). Density functional theory (DFT) calculations were utilized to evaluate molecular and reactivity characteristics, while molecular dynamics simulations were performed to investigate inhibitory mechanisms and selectivity effects. Advanced computational methods, such as QM/MM simulations, offered more understanding of peptide–protein interactions. Results: Among the four principal peptides, pep-7 exhibited the most elevated DFT values (−3386.93 kcal/mol) and the maximum dipole moment (10,761.58 Debye), whereas pep-17 had the lowest DFT value (−5788.49 kcal/mol) and the minimal dipole moment (2654.25 Debye). Molecular dynamics simulations indicated that pep-7 had a steady binding free energy of −12.88 kcal/mol and consistently bound inside the HER2 pocket during a 300 ns simulation. The QM/MM simulations showed that the overall total energy of the system, which combines both QM and MM contributions, remained around −79,000 ± 400 kcal/mol, suggesting that the entire protein–peptide complex was in a stable state, with pep-7 maintaining a strong, well-integrated binding. Conclusions: Pep-7 emerged as the most promising therapeutic peptide, displaying strong binding stability, favorable binding free energy, and molecular stability in HER2-overexpressing cancer models. These findings suggest pep-7 as a viable therapeutic candidate for HER2-positive cancers, offering a potential novel treatment strategy against HER2-driven malignancies. Full article
Show Figures

Graphical abstract

15 pages, 5634 KiB  
Article
Homogeneous Ensemble Feature Selection for Mass Spectrometry Data Prediction in Cancer Studies
by Yulan Liang, Amin Gharipour, Erik Kelemen and Arpad Kelemen
Mathematics 2024, 12(13), 2085; https://doi.org/10.3390/math12132085 - 3 Jul 2024
Cited by 1 | Viewed by 1301
Abstract
The identification of important proteins is critical for the medical diagnosis and prognosis of common diseases. Diverse sets of computational tools have been developed for omics data reduction and protein selection. However, standard statistical models with single-feature selection involve the multi-testing burden of [...] Read more.
The identification of important proteins is critical for the medical diagnosis and prognosis of common diseases. Diverse sets of computational tools have been developed for omics data reduction and protein selection. However, standard statistical models with single-feature selection involve the multi-testing burden of low power with limited available samples. Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues. Ensemble feature selection in machine learning (ML) may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models and thereby simplify their interpretability. In this study, we developed a three-stage homogeneous ensemble feature selection (HEFS) approach for both identifying proteins and improving prediction accuracy. This approach was implemented and applied to ovarian cancer proteogenomics datasets comprising (1) binary putative homologous recombination deficiency (HRD)- positive or -negative samples; (2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown samples). We conducted and compared various ML methods with HEFS including random forest (RF), support vector machine (SVM), and neural network (NN) for predicting both binary and multiple-class outcomes. The results indicated that the prediction accuracies varied for both binary and multiple-class classifications using various ML approaches with the proposed HEFS method. RF and NN provided better prediction accuracies than simple Naive Bayes or logistic models. For binary outcomes, with a sample size of 122 and nine selected prediction proteins using our proposed three-stage HEFS approach, the best ensemble ML (Treebag) achieved 83% accuracy, 85% sensitivity, and 81% specificity. For multiple (five)-class outcomes, the proposed HEFS-selected proteins combined with Principal Component Analysis (PCA) in NN resulted in prediction accuracies for multiple-class classifications ranging from 75% to 96% for each of the five classes. Despite the different prediction accuracies of the various models, HEFS identified consistent sets of proteins linked to the binary and multiple-class outcomes. Full article
(This article belongs to the Special Issue Current Research in Biostatistics)
Show Figures

Figure 1

29 pages, 7312 KiB  
Article
Evaluating Ovarian Cancer Chemotherapy Response Using Gene Expression Data and Machine Learning
by Soukaina Amniouel, Keertana Yalamanchili, Sreenidhi Sankararaman and Mohsin Saleet Jafri
BioMedInformatics 2024, 4(2), 1396-1424; https://doi.org/10.3390/biomedinformatics4020077 - 22 May 2024
Cited by 3 | Viewed by 3443
Abstract
Background: Ovarian cancer (OC) is the most lethal gynecological cancer in the United States. Among the different types of OC, serous ovarian cancer (SOC) stands out as the most prevalent. Transcriptomics techniques generate extensive gene expression data, yet only a few of these [...] Read more.
Background: Ovarian cancer (OC) is the most lethal gynecological cancer in the United States. Among the different types of OC, serous ovarian cancer (SOC) stands out as the most prevalent. Transcriptomics techniques generate extensive gene expression data, yet only a few of these genes are relevant to clinical diagnosis. Methods: Methods for feature selection (FS) address the challenges of high dimensionality in extensive datasets. This study proposes a computational framework that applies FS techniques to identify genes highly associated with platinum-based chemotherapy response on SOC patients. Using SOC datasets from the Gene Expression Omnibus (GEO) database, LASSO and varSelRF FS methods were employed. Machine learning classification algorithms such as random forest (RF) and support vector machine (SVM) were also used to evaluate the performance of the models. Results: The proposed framework has identified biomarkers panels with 9 and 10 genes that are highly correlated with platinum–paclitaxel and platinum-only response in SOC patients, respectively. The predictive models have been trained using the identified gene signatures and accuracy of above 90% was achieved. Conclusions: In this study, we propose that applying multiple feature selection methods not only effectively reduces the number of identified biomarkers, enhancing their biological relevance, but also corroborates the efficacy of drug response prediction models in cancer treatment. Full article
(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)
Show Figures

Figure 1

Back to TopTop