Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics

Tokareva, Alisa; Iurova, Mariia; Starodubtseva, Natalia; Chagovets, Vitaliy; Novoselova, Anastasia; Kukaev, Evgenii; Frankevich, Vladimir; Sukhikh, Gennady

doi:10.3390/ijms26146630

Open AccessArticle

Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics

by

Alisa Tokareva

¹,

Mariia Iurova

^1,*,

Natalia Starodubtseva

¹

,

Vitaliy Chagovets

¹,

Anastasia Novoselova

¹,

Evgenii Kukaev

^1,2,3

,

Vladimir Frankevich

^1,4 and

Gennady Sukhikh

^1,5

¹

V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Ministry of Healthcare of Russian Federation, 117997 Moscow, Russia

²

V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia

³

Moscow Center for Advanced Studies, 123592 Moscow, Russia

⁴

Laboratory of Translational Medicine, Siberian State Medical University, 634050 Tomsk, Russia

⁵

Department of Obstetrics, Gynecology, Perinatology and Reproductology, Institute of Professional Education, Federal State Autonomous Educational Institution of Higher Education, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation, 119991 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Int. J. Mol. Sci. 2025, 26(14), 6630; https://doi.org/10.3390/ijms26146630

Submission received: 28 May 2025 / Revised: 25 June 2025 / Accepted: 8 July 2025 / Published: 10 July 2025

(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Ovarian cancer (OC), the third most common gynecologic malignancy, exhibits distinct metabolic alterations that could enable early detection via liquid biopsy. We developed an advanced machine learning pipeline integrating lipidomics (HPLC-MS, positive/negative ion modes) and NMR-based metabolomics to analyze plasma samples from 229 subjects, including 103 serous OC patients, 107 benign cases, and 19 healthy controls. By systematically evaluating feature selection methods and machine learning architectures, we identified optimal biomarker combinations for OC detection. Convolutional Neural Network (CNN) model based on Mann–Whitney-selected features demonstrated strong discriminatory power (81% accuracy) in distinguishing malignant from benign cases, while Extreme Gradient Boosting (XGBoost) combined with Support Vector Machine-Recursive Feature Elimination (SVM-RFE) achieved exceptional performance (96% accuracy) in differentiating benign from control samples. For multiclass classification, XGBoost with Kruskal–Wallis-selected features achieved 77% accuracy, while one-versus-one CNN models utilizing Mann–Whitney-selected features attained 78% accuracy, demonstrating optimal performance among tested approaches. The complementary strengths of deep learning and ensemble methods underscore their potential for tailored diagnostic applications. While clinical implementation requires further standardization, these findings provide both a methodological framework for metabolic biomarker discovery and biological insights into OC pathophysiology, paving the way for integrated multi-omics approaches in gynecologic oncology.

Keywords:

ovarian tumor; plasma; metabolome; machine learning; feature selection; MLP; XGBoost; CNN; neural network

1. Introduction

Ovarian cancer (OC) remains a significant global health challenge, with GLOBOCAN reporting 69,472 new cases and 46,232 deaths across Europe in 2022 alone [1]. This malignancy is particularly insidious because it is frequently diagnosed at advanced stages, with approximately 70% of cases detected after regional or distant metastasis has occurred. Current diagnostic protocols typically identify ovarian tumors during routine gynecological examinations, where accurate malignancy determination becomes crucial for clinical decision-making and treatment stratification. The standard diagnostic approach includes the Risk of Malignancy Index (RMI) and Risk of Ovarian Malignancy Algorithm (ROMA), which combine ultrasound findings, menopausal status, and serum CA125 levels [2]. Although CA125 has been the gold standard biomarker for decades, its limited sensitivity (50–60% for early-stage disease) and specificity (compromised by benign conditions) have spurred research into alternative molecular signatures [3].

Recent advances in molecular diagnostics have identified promising alternatives to CA125, including protein panels (particularly CA15-3, CA 19-9, HE4, and hCG [4,5,6,7,8]), circulating miRNAs (such as miR-200 family members [8,9]), and small molecules (notably lysophospholipids and acylcarnitines [10,11]). The emerging field of metabolomics has shown great promise, with studies demonstrating that tumor-specific metabolic reprogramming [12,13] produces distinct signatures in both tissue biopsies and biological fluids [14,15,16,17]. These -omics approaches offer potential for earlier detection and more accurate differentiation between benign and malignant states, with some recent studies reporting area under the curve (AUC) values exceeding 0.90 in validation cohorts [18,19].

However, -omics research (encompassing metabolomics, proteomics, and lipidomics) faces the persistent challenge of the “curse of dimensionality” [20,21]. This phenomenon, where datasets contain orders of magnitude more features (p) than samples (n), poses significant statistical challenges for robust biomarker discovery [22,23,24,25]. Modern analytical pipelines address this through sophisticated computational strategies, including three primary feature selection approaches: filter methods (employing univariate statistical thresholds), wrapper methods (using iterative classifier performance), and embedded methods (with built-in feature selection like LASSO (least absolute shrinkage and selection operator) regression) [21,26,27]. Recent methodological innovations have demonstrated that hybrid approaches—particularly ensemble feature selection combining multiple methods [28,29,30,31,32] or consensus analysis across different platforms [14]—can improve biomarker reliability. Similarly, integrating multiple classification algorithms (including Random Forest (RF), Support Vector Machines (SVM) with non-linear kernels, and regularized regression models) has shown promise in overcoming individual method limitations [15,33,34]. Deep learning approaches are also gaining traction, with convolutional neural networks (CNN) achieving notable success in image-based OC diagnostics [26,35].

This study aims to develop and validate robust classification models capable of distinguishing between healthy controls, benign ovarian tumors, and OC. Our approach will incorporate lipidomic (liquid chromatography–mass spectrometry, HPLC-MS) and metabolomic (nuclear magnetic resonance, NMR) data for more than 200 patients while implementing advanced feature selection and machine learning techniques to address the high-dimensionality challenge inherent in -omics datasets.

2. Results

2.1. Clinical Characteristics of Study Participants

Plasma lipid and metabolite profiles were comprehensively analyzed across three well-characterized cohorts: patients with serous ovarian carcinoma (n = 103), benign ovarian tumors (n = 107), and healthy controls (n = 19) to identify potential diagnostic biomarkers. As detailed in Table 1, the OC cohort exhibited distinct histopathological characteristics, with high-grade serous carcinoma representing the predominant subtype (57%, n = 59). The remaining cases comprised borderline tumors (27%, n = 28) and low-grade carcinomas (16%, n = 16). Among high-grade cases classified by International Federation of Gynecology and Obstetrics (FIGO) staging, the distribution was as follows: stage I (n = 5, 8.5%), stage II (n = 6, 10.2%), stage III (n = 44, 74.6%), and stage IV (n = 4, 6.8%).

2.2. Plasma Lipidome/Metabolome Data

Our HPLC-MS analysis identified 280 distinct lipid species, representing diverse biochemical classes. Lipid profile of plasma included 49 ether-linked glycerophospholipids (PC O-/P- and PE O-/P-), 45 diacylphosphatidylcholines (PC), and 36 oxidized lipids (OxL), along with 27 sphingomyelins (SM), 13 lysophospholipids (LPC, LPE), 15 monogalactosyldiacylglycerols (MGDG), 9 ceramides (Cer), 7 cholesterol esters (CE), and 54 triglycerides (TG). Detection specificity varied by lipid class: TG and CE were only observable in positive ion mode, while oxidized glycerophospholipids and diacylphosphatidylinositols required negative ion mode analysis. Cer and LPE demonstrated optimal detection in negative mode, in contrast to LPC, PC O-/P-, and SM, which were reliably detected in both ionization modes.

The NMR analysis expanded our metabolic characterization, identifying 33 crucial metabolites spanning several biochemical categories. These included amino acids such as alanine, arginine, and glutamine, alcohols like ethanol and myoinositol, various ketoacids including 2-hydroxybutyrate, and carboxylic acids such as citrate and lactate. Beyond individual metabolites, we derived 36 clinically significant metabolite ratios that provide insights into critical metabolic pathway activities.

The blood metabolic profile changes significantly with age: a total of 106 metabolites showed statistically significant associations with age (Supplementary Table S8). Among these, seven lipid species—LPC 18:2, LPC O-16:0, PC O-16:1/18:1, LPC P-16:0, PC 18:2_22:6, PC P-18:0/18:1, and PE P-18:0/18:2—exhibited negative correlations with age (coefficient < −0.20). Conversely, twelve metabolites—PC 16:0_16:0, PC 16:0_18:3, TG 16:0_18:1_18:1, DG 18:1_18:1, MGDG 18:1_22:6, PE 18:1_20:0, SM d18:1/18:0, TG 18:0_18:1_18:2, OxPE 16:0_18:2(OOO), SM d20:0/16:0, PE 16:0_22:6, and PS 16:0_20:3—showed positive correlations with age (coefficient > 0.20).

Through integrated analysis of both HPLC-MS and NMR datasets, we performed rigorous feature selection followed by advanced multivariate statistical approaches. This comprehensive strategy enabled identification of the most biologically and clinically relevant molecular signatures, establishing a robust foundation for subsequent biomarker discovery and pathway analysis [36].

2.3. Feature Selection

2.3.1. Comparative Performance and Stability of Feature Selection Methods in Binary Classification

The most stable method for binary comparisons was the SVM-Recursive Feature Elimination (RFE), as shown in Table 2. This method also demonstrated the highest stability due to its large final marker sets: 85 markers for distinguishing benign versus malignant ovarian tumors, 13 for benign tumors versus controls, and 14 for malignant tumors versus controls (Table S1). In contrast, LASSO and Boruta exhibited extremely low stability, failing to produce consistent marker sets (Table S1 and Table 2).

Principal component (PC) space based on Mann–Whitney-selected markers yielded the optimal clustering results, with the lowest values for Hubert–Levin’s C-index and Davies–Bouldin index (Table 2). The SVM-RFE marker space performed worse than both the Mann–Whitney and Welch methods in terms of cluster compactness and separation. However, Orthogonal Projection on Latent Structures-Discriminant Analysis (OPLS-DA) markers achieved the highest Calinski–Harabasz pseudo-F statistic, followed closely by SVM-RFE markers (Table 3).

When evaluating the overall performance of binary feature selection methods, both SVM-RFE and Mann–Whitney achieved the highest total score (23 points). However, SVM-RFE was preferred due to its superior stability. Despite their equal scores, the marker sets from these methods showed minimal overlap. For instance, only five lipids were common to both sets when distinguishing benign from malignant tumors: SM d18:1/22:0, SM d18:2/14:0, TG 10:0_18:2_18:2, TG 16:1_22:4_8:0, and TG 18:0_18:1_18:2 (Figure 1A). Similarly, PC 16:0_20:1 was the sole shared lipid between marker sets for controls versus benign tumors (Figure 1B), while CE 18:3, CE 20:4, and LPC 18:2 were common to both methods for controls versus malignant tumors (Figure 1C).

2.3.2. Comparative Performance and Stability of Feature Selection Methods in Multiclass Analysis

The Kruskal–Wallis and Partial Least Squares Discriminant Analysis (PLS-DA) selection methods demonstrated the highest quality metrics, with the Kruskal–Wallis approach showing slightly greater stability in its feature set (Table 4). In contrast, RF and LASSO-based methods failed to produce stable feature sets across iterations (Table S2).

A significant overlap was observed between the Kruskal–Wallis and PLS-DA-derived marker sets, with more than 25% of features shared between the two (Figure 2A). Notably, seven features were consistently identified across all multiclass and binary selection methods: SM d18:1/22:0, SM d18:1/22:1, TG 10:0_18:2_18:2, TG 16:1_22:4_8:0, TG 18:0_18:1_18:2, CE 20:4, and LPC 18:2 (Figure 2B, Table S3).

2.4. Machine Learning Models in Ovarian Tumor Classification

The machine learning models were optimized using Particle Swarm Optimization (PSO) to determine their ideal hyperparameters. This included three SVM configurations with distinct kernel functions (polynomial, radial basis function, and sigmoid), three neural network architectures (Multilayer Perceptron (MLP), CNN, and Residual Convolutional Neural Network (ResNet)), and an Extreme Gradient Boosting (XGBoost) model. The resulting optimal hyperparameter sets for each algorithm are comprehensively presented in Supplementary Table S4. For distinguishing between benign and malignant tumors, a CNN-based model utilizing the Mann–Whitney marker set achieved the highest performance, with accuracy and mean recall both at 0.81, along with 77% sensitivity and 84% specificity. In classifying benign tumors versus controls, an XGBoost model with SVM-RFE-selected features demonstrated exceptional results, achieving 0.96 accuracy and 0.95 mean recall, with perfect sensitivity (100%) and high specificity (90%). For malignant tumor versus control classification, both RF (SVM-RFE set) and XGBoost (Mann–Whitney set) models performed equally well, each showing 0.92 accuracy and mean recall, 93% sensitivity, and 90% specificity.

In one-versus-one (OvO) classification, the CNN model with Mann–Whitney markers yielded the best results (accuracy: 0.78, mean recall: 0.77) (Figure 3A–D and Figure 4, Tables S5 and S6). Overall, models using Mann–Whitney-selected features showed non-significantly higher accuracy (median: 0.78, IQR: 0.67–0.85) and mean recall (0.85, IQR: 0.76–0.90) compared to those using SVM-RFE features (accuracy: 0.74, IQR: 0.67–0.90; mean recall: 0.74, IQR: 0.68–0.90; p = 0.28 for accuracy, p = 0.17 for recall).

For multiclass classification, XGBoost with Kruskal–Wallis-selected features achieved the highest performance (accuracy: 0.78, mean recall: 0.77) (Figure 3E,F, Table S7). Models using Kruskal–Wallis markers consistently outperformed others, showing significantly higher accuracy (median 0.72, IQR 0.70–0.74) than PLS-DA-based models (median 0.66, IQR 0.63–0.67, p = 0.02) and marginally higher than SVM-RFE (median 0.72, IQR 0.69–0.72, p = 0.07) and Mann–Whitney (median 0.72, IQR 0.67–0.73, p = 0.05) sets. Similarly, they demonstrated significantly better mean recall (median 0.74, IQR 0.71–0.77) versus PLS-DA (median 0.68, IQR 0.63–0.69) and Mann–Whitney sets (median 0.74, IQR 0.68–0.75), in both cases p = 0.01 and a non-significant improvement over SVM-RFE (median 0.73, IQR 0.70–0.73), p = 0.10).

Comparing the top multiclass models, XGBoost (Kruskal–Wallis set) and the OvO CNN (Mann–Whitney set) showed comparable overall accuracy and recall. However, XGBoost had lower malignant tumor recall (60%) and benign tumor precision (69%), but higher benign tumor recall (84%) and malignant tumor precision (90%) relative to the CNN model (87%, 95%, 63%, 68%, respectively) (Table 5). Finally, XGBoost (median accuracy: 0.74, IQR: 0.72–0.89; recall: 0.76, IQR: 0.72–0.89) and CNN models (accuracy: 0.76, IQR: 0.73–0.85; recall: 0.77, IQR: 0.74–0.88) performed similarly (p = 0.21 for accuracy, p = 0.57 for recall), while CNNs significantly outperformed MLP in both metrics (median 0.74, IQR 0.69–0.81, p = 0.02 for accuracy, (median 0.75, IQR 0.69–0.83, p = 0.006 for recall).

3. Discussion

The identification of optimal feature sets represents a fundamental step in developing reliable predictive models for tumor classification, with significant implications for diagnostic accuracy and clinical decision-making [27]. Feature selection methods such as Mann–Whitney U tests and SVM-RFE employ distinct statistical approaches yet frequently generate models with similar overall performance metrics [37]. A closer examination reveals important distinctions between these approaches. Feature sets derived from Mann–Whitney testing demonstrate particularly strong performance in cluster separation quality, as quantified by established validation metrics. The elevated Davies–Bouldin index scores indicate more compact and well-separated clusters, while improved Hubert–Levin’s C index values reflect superior between-class discrimination. These properties suggest that Mann–Whitney selected features may be particularly valuable for applications requiring clear pathological categorization, such as distinguishing between benign and malignant tumor subtypes.

The age disparity between OC patients and both benign tumor and control groups introduces a potential confounding factor, given known age-related changes in blood lipid and metabolite profiles [38]. However, only 18 features showed a weak age association (0.30 >|r| > 0.20), while 87 had minimal correlation (|r| ≤ 0.20), confirming malignancy status as the primary determinant of the observed differences.

In Mann–Whitney-based panels, four age-associated lipids (DG 18:1_18:1, PC 16:0_18:3, TG 16:0_18:1_18:1, and TG 18:0_18:1_18:2) appeared in both panels discriminating malignant tumors from control and benign groups. Another four (PE 16:0_22:6, PS 16:0_20:3, LPC 18:2, and PE P-18:0/18:2) were specific to the malignant vs. control comparison. In contrast, only two lipids (PC 16:0_16:0 and TG 18:0_18:1_18:2) were shared in the malignant vs. benign panel, and just one (LPC 18:2) overlapped between both malignant tumor comparisons in SVM-RFE-based panels (Supplementary Figure S1). This suggests that SVM-RFE panels are more age-stable than Mann–Whitney panels.

Notably, LPC 18:2—which was included in an OPLS-DA panel distinguishing controls from benign tumors (groups without significant age differences)—has been linked to pancreatic [39] and colorectal [40] cancer in age-matched studies. Similarly, elevated blood triglycerides correlate with increased ovarian cancer risk in age-adjusted cohorts [41], and PC 16:0_16:0 shows malignancy-associated elevation independent of age [42,43].

The SVM-RFE approach demonstrates complementary strengths in predictive modeling applications. Lopez and colleagues provided compelling evidence that models built using SVM-RFE selected features surpass those utilizing RF or Relief-based selection in terms of classification accuracy and biomarker [44]. This advantage likely stems from SVM-RFE’s iterative optimization process, which evaluates feature importance within the context of the classifier’s decision boundary rather than relying solely on univariate statistical tests [27]. Such characteristics make SVM-RFE particularly effective for complex discrimination tasks where multiple biomarkers interact in non-linear ways to determine pathological status [45].

However, as Barbieri’s research team demonstrated, the performance of any feature selection method depends critically on dataset characteristics [46]. Factors including sample size, class imbalance, measurement noise, and biological heterogeneity all substantially influence which selection approach proves most effective. For instance, in datasets with strong effect sizes and minimal confounding variables, simpler univariate methods like Mann–Whitney may suffice [47]. Conversely, in scenarios involving high-dimensional data with numerous correlated features, more sophisticated techniques like SVM-RFE or embedded methods may be necessary to capture complex biomarker interactions [48].

The choice of feature selection methodology also carries important implications for model translation into clinical practice [49]. While computationally intensive methods may achieve marginally better performance in research settings, simpler approaches often prove more practical for clinical implementation due to easier validation and interpretation. This trade-off between performance and practicality underscores the need for careful method selection aligned with the specific application requirements and implementation constraints [50]. Future research directions should focus on developing adaptive selection frameworks that can automatically adjust to dataset characteristics while maintaining biological interpretability and clinical relevance.

This study represents a significant advancement in OC diagnostics by being the first to systematically identify optimal machine learning algorithms for analyzing complex multi-omics data (plasma metabolites and lipids) in a large clinical cohort of 229 OC patients. Our comprehensive evaluation demonstrates that XGBoost and RF models achieve an exceptional balanced accuracy of 92% for differentiation of OC from controls, with 93% sensitivity and 90% specificity—a performance level that compares favorably with established metabolomic approaches [51]. Notably, while Ban et al. found that SVM outperformed Adaptive Boosting and RF [51], our results underscore the critical importance of algorithm selection tailored to specific data characteristics and diagnostic objectives [52]. In classifying benign tumors versus controls, our XGBoost model with SVM-RFE-selected features achieved perfect sensitivity (100%) and high specificity (90%). Similarly, Fei Long et al. (2025) identified plasma extracellular vesicle metabolites as highly discriminative biomarkers, with SVM and RF models achieving an AUC of 0.94 in differentiating OC from benign tumors [11].

While plasma metabolites show great promise, protein-based markers remain clinically relevant, though their performance varies depending on analyte combinations and detection methods. Diagnostic panels incorporating fibrinogen, D-dimer, and the well-established CA-125 marker have achieved notable sensitivity (92%) and specificity (79%) in some studies [53]. More complex protein signatures, such as those combining CA125 with IGFBP2, SPP1, TSP1, and ADI, have demonstrated accuracy comparable to advanced XGBoost models [54,55,56]. Similarly, logistic regression models using IL-8 and TNFα [6], as well as neutrophil gelatinase-associated lipocalin/matrix metallopeptidase-9 complexes [5], have shown diagnostic performance on par with machine learning approaches. However, multi-analyte models exhibit considerable variability; for instance, combinations of CA125, CCL20, and menopausal status yielded reduced accuracy (77%) compared to lipidomic-based XGBoost models (81%) [7]. This underscores the necessity of rigorous biomarker selection and validation.

MLP architecture represents a fundamental yet powerful type of artificial neural network particularly well-suited for clinical research applications. Its versatility stems from the ability to directly process diverse data types including-omics profiles, categorical clinical variables, and continuous numerical measurements [57,58,59,60]. This inherent flexibility has established MLP as a widely adopted approach across various clinical prediction tasks. Most notably, MLPs operate effectively on properly scaled data without necessitating transformation into pseudo-continuous representations. This characteristic significantly reduces preprocessing complexity and minimizes potential error introduction during data conversion steps [33].

The performance of MLPs varies significantly depending on the nature of the classification task, the dataset characteristics, and the comparative machine learning models [61]. In our study, MLPs exhibited lower diagnostic accuracy in binary classification tasks but demonstrated superior performance in multiclass problems. This aligns with findings from Wang et al., where MLPs outperformed SVMs, suggesting that their hierarchical learning structure may be better suited for complex, multi-category discrimination [62].

However, the effectiveness of MLPs is not universally consistent across studies. For instance, Long et al. reported that MLPs were less accurate than both RF and SVM models, contrasting with our observation that MLPs surpass Naive Bayes (NB) classifiers [11]. This discrepancy may stem from differences in dataset composition, feature selection, or model hyperparameter tuning. Interestingly, in the differential diagnosis of inflammatory myopathy subtypes, MLPs ranked below RF and SVM but still exceeded the performance of NB, reinforcing the notion that MLPs occupy an intermediate position among machine learning classifiers in certain biomedical applications [14].

Notably, MLPs exhibit strong diagnostic capabilities in specific clinical contexts. For example, in Parkinson’s disease detection, MLPs and XGBoost models achieve high classification accuracy, whereas RF and SVM underperform [15]. This suggests that neural network-based approaches may be particularly effective for neurodegenerative disorder diagnostics, possibly due to their ability to capture non-linear patterns in heterogeneous biomedical data.

CNNs have emerged as powerful tools for diagnostic applications, consistently outperforming traditional machine learning methods across multiple studies [63,64,65,66,67]. However, CNN efficacy is highly dependent on dataset characteristics, with sample size being a critical limiting factor. Several studies have reported significant performance degradation when CNNs are applied to smaller datasets [68,69,70]. This data-hungry nature of deep learning architectures means that in resource-constrained scenarios with limited sample availability, simpler machine learning methods may offer comparable diagnostic accuracy while providing additional benefits in terms of computational efficiency and interpretability [33].

Notably, our findings indicate that CNNs consistently outperform MLPs, particularly when the input data can be effectively transformed into an image-like representation. This performance gap highlights the importance of proper data structuring for neural network applications. The process of converting conventional tabular data into artificial image formats, while computationally intensive, appears justified by the subsequent improvements in model accuracy [71].

This study advances OC diagnostics through a rigorous, large-scale integration of metabolomic and lipidomic data with machine learning. Our rigorous approach encompasses several strengths. First, the study leverages a substantial clinical cohort of 229 patients, including carefully matched comparison groups of benign ovarian neoplasms and healthy controls, enabling rigorous differential diagnosis evaluation. Second, the implementation of strict selection criteria—including sample collection prior to any therapeutic intervention or surgical procedure—ensures minimal confounding from treatment effects while providing a clear window into disease-specific metabolic alterations.

The usage of blood plasma as a biospecimen offers particular clinical advantages, being both minimally invasive and readily accessible for potential diagnostic implementation. Our deep metabolic profiling approach combines complementary analytical platforms: comprehensive lipidomic analysis using HPLC-MS in both ionization modes with MS/MS identification, coupled with NMR-based characterization of the low-molecular-weight metabolome. This dual-platform strategy provides exceptional coverage of both hydrophobic and hydrophilic metabolite fractions, capturing a more complete metabolic signature than single-platform approaches.

From a computational perspective, the study makes three key contributions: (1) an exhaustive systematic comparison of feature selection methods and classification algorithms, revealing context-dependent performance advantages; (2) a sophisticated evaluation of binary versus multiclass classification strategies, including OvO architecture benchmarking; and (3) implementation of PSO to efficiently explore over 400 hyperparameter combinations per method-task pairing, ensuring robust model configuration. The resulting models demonstrated high discriminatory power, with XGBoost achieving 96% accuracy in benign versus control classification and CNNs reaching 81% accuracy in malignant versus benign differentiation. Beyond diagnostic performance, these models provide valuable biological insights into OC metabolism through their identified feature signatures. Furthermore, the study establishes a methodological framework that could be extended to other cancers or multi-omics investigations.

While this study provides promising insights, several limitations should be acknowledged. First, although our cohort was substantial, the relatively small number of healthy controls may affect the generalizability of results. While this imbalance reflects real-world referral patterns in specialty centers, it underscores the need for cautious interpretation. To address this imbalance, we employed safe-level SMOTE to generate fifty synthetic control samples while carefully preserving data integrity. Importantly, we conducted feature selection prior to SMOTE application to minimize its impact on variable variability. This approach has proven effective in numerous clinical studies facing similar imbalance challenges [57,72,73,74,75], consistently improving model performance metrics including F-value and Youden index [76,77,78,79,80]. While SMOTE enhances minority class representation by increasing inter-sample correlations, it maintains the original sample structure without artificially altering internal data relationships [81]. Recent comparative analyses, including the work by Welvaars et al. (2023), demonstrate that while resampling methods like SMOTE improve classification performance, they may introduce overestimation of positive predictions [82]. This finding emphasizes the importance of carefully defining clinical prediction tasks when implementing such techniques. Our methodological approach, combining prudent feature selection with targeted SMOTE application, represents a balanced strategy for developing more robust clinical decision support tools while acknowledging these inherent limitations.

The single-center design presents another constraint. Despite robust internal validation, our models’ performance may not fully translate to broader populations due to variations in demographics, imaging protocols, and diagnostic workflows across institutions. While PSO successfully enhanced model discrimination by maximizing predictive performance, it cannot eliminate the risk of overfitting. The lack of external validation remains a critical concern, particularly given the variability observed in biomarker selection across different feature selection methods. These methodological challenges highlight the essential need for independent validation in diverse cohorts and verification through orthogonal analytical approaches.

From a translational perspective, while our machine learning models performed well, their clinical adoption faces challenges. Complex algorithms, particularly deep learning, often lack interpretability, which is critical for physician trust and regulatory approval. Moreover, we did not evaluate practical implementation barriers such as cost-effectiveness, assay reproducibility, or workflow integration—key factors for real-world utility.

Despite these limitations, our findings lay groundwork for developing liquid biopsy tests that could complement existing diagnostics. The identified metabolic signatures may provide new insights into OC pathogenesis and reveal novel therapeutic targets. To facilitate translation, we propose: (1) prospective multicenter validation with standardized protocols, (2) expanded cohorts to enhance statistical power, and (3) integration of multi-omics data to improve predictive accuracy and biological insight.

4. Materials and Methods

4.1. Study Design

The OC cohort comprised patients who underwent cytoreductive surgery at the V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology, and Perinatology (NMRC for OGP, Moscow, Russia) between November 2019 and July 2020. The study included 229 participants divided into three distinct groups. The OC group consisted of 103 patients with histologically verified serous ovarian tumors, comprising 59 cases of high-grade serous carcinoma (including 10 at FIGO stages IA-IIB and 49 at stages IIC-IVA), 16 low-grade serous carcinomas, and 28 serous borderline tumors. For comparison, 107 patients with benign ovarian pathologies were enrolled, including 30 serous cystadenomas, 56 endometrioid cysts, and 21 mature teratomas. Additionally, 19 healthy women without any ovarian pathology formed the control group, with their status confirmed through comprehensive clinical evaluation involving detailed medical history, pelvic ultrasound examination, complete blood tests (both clinical and biochemical parameters), and assessment of specific tumor markers (CA125 and HE4) accompanied by ROMA and RMI calculations.

The study adhered to the ethical standards of the institutional research committee, Russian federal laws, and the 1964 Helsinki Declaration (and its later amendments). Written informed consent was obtained from all participants, and the study protocol (No. 10, 5 December 2019) was approved by the NMRC for OGP Ethics Committee.

Patients in the OC group were required to have histologically confirmed serous ovarian carcinoma within FIGO stages I-IV, while those in the comparison group needed histological verification of benign ovarian lesions (serous cystadenomas, endometrioid cysts, or mature teratomas).

Uniform exclusion criteria applied to all study groups included: (1) age < 18 years; (2) current or recent (≤6 months) hormonal therapy (oral contraceptives or hormone replacement therapy); (3) confirmed BRCA mutations; and (4) significant comorbidities including diabetes mellitus, active inflammatory/infectious diseases, or current pregnancy. The OC group had additional exclusions for patients with primary multiple malignancies or mixed epithelial ovarian tumor histologies. Both the benign lesion comparison group and healthy controls were excluded for any history of pelvic surgeries or prior malignancy diagnoses.

Blood samples were collected preoperatively in K2EDTA vacutainer tubes prior to administration of any perioperative medications (including antibiotics and analgesics). Whole blood was immediately processed by two-step centrifugation: first at 300× g for 20 min at 4 °C to separate cellular components, followed by collection of the supernatant which underwent secondary centrifugation at 12,000× g for 10 min at room temperature to obtain platelet-poor plasma. The final plasma aliquots were transferred to pre-labeled cryovials using wide-bore pipette tips to minimize shear stress, and immediately stored at −80 °C in a monitored freezer until analysis.

4.2. Lipidomic Analysis of Blood Plasma Samples (HPLC-MS)

Lipidomic profiling was conducted using an established laboratory protocol [13,83,84]. Plasma lipid extraction was performed via a modified Folch method where 40 μL of plasma was mixed with 480 μL of chloroform:methanol (2:1, v/v) and vortexed in an ultrasonic bath for 10 min. After adding 150 μL of deionized water, the mixture was centrifuged at 13,000× g for 5 min at 20 °C. The organic phase containing lipids was collected, evaporated under a gentle nitrogen stream, and reconstituted in 200 μL of isopropanol:acetonitrile (2:1, v/v).

To ensure analytical reliability, pooled quality control (QC) samples were prepared by combining equal 50 μL aliquots from all study participants’ samples, creating a representative reference matrix. Blank samples were prepared with isopropanol:acetonitrile (2:1, v/v) solvent mixture. QC samples were systematically injected every 10 study samples throughout the HPLC-MS batch runs. For each sample batch analysis, the first three samples analyzed were blanks, and before each QC samples blank sample was analyzed.

Chromatographic separation was achieved using an Ultimate 3000 HPLC system (Thermo Scientific, Bremen, Germany) coupled to a Maxis Impact qTOF mass spectrometer (Bruker Daltonics, Bremen, Germany). Separation was performed on a Zorbax XDB-C18 column (250 × 0.5 mm, 5 μm; Agilent, Santa Clara, CA, USA) maintained at 50 °C with a flow rate of 35 μL/min. The mobile phase consisted of Eluent A (10 mM ammonium formate with 0.1% formic acid in water:acetonitrile [40:60, v/v]) and Eluent B (10 mM ammonium formate with 0.1% formic acid in isopropanol:acetonitrile:water [90:8:2, v/v/v]). A linear gradient increased Eluent B from 30% to 95% over 25 min.

MS analysis of study samples was performed in both positive (400–1500 m/z) and negative (100–1000 m/z) ionization modes with capillary voltages of +4.1 kV and −3.0 kV, respectively. The nebulizer gas pressure was maintained at 0.7 bar with a dry gas flow of 6 L/min at 200 °C. For comprehensive lipid identification, data-dependent MS/MS acquisition was performed on QC samples. The instrument dynamically selected the top three most intense precursor ions from each full scan for fragmentation, applying a normalized collision energy of 35 eV. A dynamic exclusion window of 60 s was implemented to prevent repeated fragmentation of dominant ions, ensuring broader coverage of lower-abundance species.

Lipid identification was performed using LipidMatch [85] after data preprocessing, with inter-batch normalization by autoscaling [86]. All lipid species are reported according to LIPID MAPS classification [87].

4.3. Metabolomic Analysis by NMR Spectroscopy

Plasma metabolomic profiling was performed using 700 MHz NMR spectroscopy. Two phosphate buffer systems were prepared: Buffer A consisted of 80:20 H₂O/D₂O (v/v) sodium-phosphate buffer (pH 7.4) containing 6.15 mM sodium azide (NaN₃) and 4.64 mM 3-(trimethylsilyl)propionic-2,2,3,3-d₄ acid (TSP, Cambridge Isotope Laboratories Inc., Leicestershire, UK) sodium salt as an internal reference. Buffer B contained sodium-phosphate buffer in D₂O (pH 7.4) with 1.5 M K₂HPO₄, 2 mM NaN₃, and 4 mM TSP. For analysis, 120 μL of plasma was mixed with 120 μL of buffer solution, and 190 μL of this mixture was transferred to 5 mm NMR tubes (Bruker BioSpin Ltd., Ettlingen, Germany) and maintained at 6 °C until measurement.

All ¹H-NMR spectra were acquired on a Bruker 700 MHz AVANCE NEO spectrometer (Bruker BioSpin, Ettlingen, Germany) equipped with a Prodigy cryogenic probe at 37 °C, with temperature calibration performed using d₄-methanol (99.8% purity). The acquisition employed a Carr–Purcell–Meiboom–Gill (CPMG) pulse sequence with presaturation for water suppression, incorporating 128 refocusing pulses (0.6 ms echo delay each) for a total T₂ filtering period of 78 ms. Following 4 dummy scans, spectra were collected with 73,728 data points across a 12,019 Hz spectral width.

Metabolite identification was performed using Bruker Biorefcode (Bruker BioSpin, Ettlingen, Germany) by matching both 1D and 2D J-resolved spectra against reference libraries. Semi-automated quantification was conducted using Chenomx NMR Suite 9.0 (Chenomx Inc., Edmonton, AB, Canada), with metabolite concentrations calculated relative to the 0.4 mM TSP reference standard [36].

4.4. Feature Selection and Stability Analysis

Lipidomic and metabolomic data were integrated and processed to reduce feature dimensionality. For binary classification, seven feature selection methods were employed: (1) Wilcoxon–Mann–Whitney test (p < 0.05), (2) Welch’s t-test (p < 0.05), (3) OPLS-DA with VIP > 1 [88], (4) RF (top √n features by Gini index) [89], SVM-RFE based on SVM weights (iterative elimination until model accuracy decreases) [90], LASSO (non-zero coefficients) [91], and (7) Boruta (all relevant selection) [92]. For multiclass classification, five methods were used: (1) Kruskal–Wallis test (p < 0.05), (2) PLS-DA, VIP > 1 (3) RF (top √n features) [89], (4) LASSO (non-zero coefficients) [91], and (5) Boruta (all relevant selection) [92] (Figure 5A).

Method stability was assessed through 100 iterations on randomly selected 70% subsamples. Features consistently selected in all iterations were considered robust group discriminators. The robustness of the feature selection method was further quantified using Koch’s biotic diversity index [93]. A PC space was constructed from the final selected features, and three cluster validation metrics were computed: (1) Hubert–Levin’s normalized C-index [94], (2) Davies–Bouldin index [95], and (3) Calinski–Harabasz pseudo-F statistic [96] (Figure 5A). Missing values were imputed as 100 (C-index/Davies–Bouldin) or −100 (Calinski–Harabasz).

Each method was ranked (7/5 to 1 for binary/multiclass) across all metrics, with the highest-scoring method’s features advancing to model selection.

4.5. Classification Model Selection

To address the significant class imbalance (1:5.6:5.4 ratio) in our dataset, we employed the safe-level Synthetic Minority Over-sampling Technique (SMOTE) [76], generating 50 synthetic control samples to improve model training [76]. The balanced dataset was then partitioned into training (70%) and test (30%) sets while preserving the original distribution patterns. For binary classification, eleven methods were evaluated: NB, OPLS-DA, RFt, SVM (with linear, polynomial, radial, and sigmoid kernels), XGBoost, MLP, CNN, and ResNet. Multiclass classification was performed using seven selected methods: NB, PLS-DA, RF, XGBoost, MLP, CNN, and ResNet (Figure 5B,C). Additionally, we implemented a OvO strategy for binary classifiers, where classification accuracy scores from individual binary models were aggregated to enhance multiclass prediction performance.

For CNN and ResNet models, input data were transformed into 2D representations using the DeepInsight methodology [97]. These models incorporated a GELU activation layer followed by dropout (rate = 0.1). All neural networks were trained with an initial learning rate of 0.01 and Adamax optimization; the learning rate decayed at 0.5 for all models except the binary ResNet, which used a decay rate of 0.9. Hyperparameter tuning for SVMs, XGBoost, and the architectures of MLP, CNN, and ResNet was performed via PSO [98].

All analyses were conducted in R 4.3.3 using the following packages: ropls [99], RandomForest [100], e1071 [101], glmnet [102], boruta [92], xgboost [103], clustersim [104], smotefamily [105], lsa [106], tsne [107], cxhull [108], caret [109], keras [110].

5. Conclusions

The integration of multi-omics data with advanced machine learning algorithms represents a transformative approach in OC diagnostics. Our large-scale analysis of plasma metabolites and lipids systematically evaluated different feature selection strategies and machine learning models for tumor classification, comparing their performance in both binary and multiclass settings. Among binary classification approaches, SVM-RFE and Mann–Whitney methods demonstrated comparable performance scores. However, SVM-RFE emerged as the preferred choice due to its significantly higher stability (mean stability score: 0.75 vs. 0.40), despite limited biomarker overlap between the two methods.

For multiclass classification, Kruskal–Wallis and PLS-DA-based selection methods demonstrated equivalent performance metrics, with Kruskal–Wallis showing slightly superior feature selection stability (0.47 compared to 0.46). These methods exhibited substantial feature overlap, sharing more than 25% common markers while identifying seven consensus biomarkers across all selection approaches.

The machine learning model analysis yielded several critical findings. CNN architectures utilizing Mann–Whitney selected features achieved optimal performance in malignant versus benign classification, attaining 81% accuracy and mean recall. XGBoost models utilizing SVM-RFE features excelled in benign versus control classification, achieving exceptional 95% accuracy with perfect 100% sensitivity. In multiclass evaluation, XGBoost models incorporating Kruskal–Wallis selected features reached the highest classification accuracy of 78%, representing statistically significant improvement over alternative methods.

The clinical implementation of these advanced diagnostic models will require careful consideration of practical factors, including assay standardization, reproducibility across platforms, and integration with existing clinical workflows. Nevertheless, the demonstrated performance of machine learning-driven multi-omics analysis offers a promising path toward more accurate, earlier, and potentially more accessible OC detection, addressing a critical unmet need in women’s health care.

From a translational perspective, these findings open new avenues for developing liquid biopsy tests that could complement or even reduce reliance on current diagnostic methods. The identification of robust metabolic and lipidomic signatures through machine learning approaches may also provide insights into OC pathogenesis and reveal new therapeutic targets. As the field progresses, continued refinement of these models through larger multicenter studies and the incorporation of additional -omics layers (such as proteomics and transcriptomics) may further enhance their diagnostic and prognostic utility.

Supplementary Materials

The supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26146630/s1.

Author Contributions

Conceptualization, A.T., M.I. and N.S.; data curation, M.I., A.N., V.C., E.K. and V.F.; formal analysis, A.T., M.I., N.S., E.K. and V.C.; funding acquisition, M.I., V.F. and G.S.; investigation, M.I., N.S., V.C., A.N. and E.K.; methodology, N.S., A.T., V.F. and G.S.; project administration, M.I., V.F. and G.S.; resources, V.F., V.C., N.S. and G.S.; software, A.T., E.K. and A.N.; supervision, M.I., V.F. and G.S.; visualization, N.S., V.C., A.T., E.K. and A.N.; writing—original draft, N.S., M.I., A.T., V.C., A.N. and E.K.; writing—review and editing, V.F. and G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Russian Science Foundation (No. 24-25-00407).

Institutional Review Board Statement

This study was approved by the Ethical Committee of the National Medical Research Center for Obstetrics, Gynecology, and Perinatology named after Academician V.I. Kulakov (protocol No. 10, dated 5 December 2019).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

Data are contained within the Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial neural network.
CE	Cholesterol esters
Cer(P)	Ceramide (phosphate)
CNN	Convolutional Neural Network
DG	Diacylglycerols
FIGO	International Federation of Gynecology and Obstetrics
LASSO	Least Absolute Shrinkage and Selection operator
MLP	Multilayer Perceptron
NB	Naive Bayes
OPLS-DA	Orthogonal Projection on Latent Structures-Discriminant Analysis
PLS-DA	Partial Least Squares Discriminant Analysis
PC (P-/O-)	(plasmenyl-/plasmanyl) Phosphatidylcholine
PE (P-/O-)	(plasmenyl-/plasmanyl) Phosphatidylethanolamine
PSO	Particle Swarm Optimization
ResNet	Residual Convolutional Neural Network
RFE	Recursive Feature Elimination
RF	Random Forest
SM	Sphingomyelins
SVM	Support Vector Machine
TG	Triacylglycerol
XGBoost	Extreme Gradient Boosting
VIP	Variable Importance projection

References

International Agency for Research on Cancer. Global Cancer Statistics; International Agency for Research on Cancer: Lyon, France, 2022. [Google Scholar]
Liest, A.L.; Omran, A.S.; Mikiver, R.; Rosenberg, P.; Uppugunduri, S. RMI and ROMA are equally effective in discriminating between benign and malignant gynecological tumors: A prospective population-based study. Acta Obstet. Gynecol. Scand. 2019, 98, 24–33. [Google Scholar] [CrossRef] [PubMed]
Henderson, J.T.; Webber, E.M.; Sawaya, G.F. Screening for Ovarian Cancer: An Updated Evidence Review for the U.S. Preventive Services Task Force; Agency for Healthcare Research and Quality (US): Rockville, MD, USA, 2018. [Google Scholar]
Matsas, A.; Stefanoudakis, D.; Troupis, T.; Kontzoglou, K.; Eleftheriades, M.; Christopoulos, P.; Panoskaltsis, T.; Stamoula, E.; Iliopoulos, D.C. Tumor Markers and Their Diagnostic Significance in Ovarian Cancer. Life 2023, 13, 1689. [Google Scholar] [CrossRef]
Gupta, R.K.; Dholariya, S.; Radadiya, M.; Agarwal, P. NGAL/MMP-9 as a Biomarker for Epithelial Ovarian Cancer: A Case–Control Diagnostic Accuracy Study Rohit. Saudi J. Med. Med. Sci. 2022, 10, 25–30. [Google Scholar] [CrossRef] [PubMed]
Pawlik, W.; Pawlik, J.; Kozłowski, M.; Łuczkowska, K.; Kwiatkowski, S.; Kwiatkowska, E.; Machaliński, B.; Cymbaluk-Płoska, A. The clinical importance of il-6, il-8, and tnf-α in patients with ovarian carcinoma and benign cystic lesions. Diagnostics 2021, 11, 1625. [Google Scholar] [CrossRef]
Sakares, W.; Wongkhattiya, W.; Vichayachaipat, P.; Chaiwut, C.; Yodsurang, V.; Nutthachote, P. Accuracy of CCL20 expression level as a liquid biopsy-based diagnostic biomarker for ovarian carcinoma. Front. Oncol. 2022, 12, 1038835. [Google Scholar] [CrossRef] [PubMed]
De Silva, S.; Alli-Shaik, A.; Gunaratne, J. Machine Learning-Enhanced Extraction of Biomarkers for High-Grade Serous Ovarian Cancer from Proteomics Data. Sci. Data 2024, 11, 685. [Google Scholar] [CrossRef]
Ning, L.; Lang, J.; Wu, L. Plasma circN4BP2L2 is a promising novel diagnostic biomarker for epithelial ovarian cancer. BMC Cancer 2022, 22, 6. [Google Scholar] [CrossRef]
Rong, J.; Sun, G.; Zhu, J.; Zhu, Y.; Chen, Z. Combination of plasma-based lipidomics and machine learning provides a useful diagnostic tool for ovarian cancer. J. Pharm. Biomed. Anal. 2025, 253, 116559. [Google Scholar] [CrossRef]
Long, F.; Pu, X.Y.; Wang, X.; Ma, D.X.; Gao, S.H.; Shi, J.; Zhong, X.C.; Ran, R.; Wang, L.L.; Chen, Z.; et al. A metabolic fingerprint of ovarian cancer: A novel diagnostic strategy employing plasma EV-based metabolomics and machine learning algorithms. J. Ovarian Res. 2025, 18, 26. [Google Scholar] [CrossRef]
Chagovets, V.; Starodubtseva, N.; Tokareva, A.; Novoselova, A.; Patysheva, M.; Larionova, I.; Prostakishina, E.; Rakina, M.; Kazakova, A.; Topolnitskiy, E.; et al. Specific changes in amino acid profiles in monocytes of patients with breast, lung, colorectal and ovarian cancers. Front. Immunol. 2023, 14, 1332043. [Google Scholar] [CrossRef]
Iurova, M.V.; Chagovets, V.V.; Pavlovich, S.V.; Starodubtseva, N.L.; Khabas, G.N.; Chingin, K.S.; Tokareva, A.O.; Sukhikh, G.T.; Frankevich, V.E. Lipid Alterations in Early-Stage High-Grade Serous Ovarian Cancer. Front. Mol. Biosci. 2022, 9, 770983. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Zhao, L.; Jiang, Y.; Li, L.; Guo, M.; Mu, Y.; Zhu, H. Integrated analysis of plasma and urine reveals unique metabolomic profiles in idiopathic inflammatory myopathies subtypes. J. Cachexia Sarcopenia Muscle 2022, 13, 2456–2472. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.D.; Xue, C.; Kolachalama, V.B.; Donald, W.A. Interpretable Machine Learning on Metabolomics Data Reveals Biomarkers for Parkinson’s Disease. ACS Cent. Sci. 2023, 9, 1035–1045. [Google Scholar] [CrossRef] [PubMed]
Yan, Q.; He, D.; Walker, D.I.; Uppal, K.; Wang, X.; Orimoloye, H.T.; Jones, D.P.; Ritz, B.R.; Heck, J.E. The neonatal blood spot metabolome in retinoblastoma. EJC Paediatr. Oncol. 2023, 2, 100123. [Google Scholar] [CrossRef]
Pyragius, C.E.; Fuller, M.; Ricciardelli, C.; Oehler, M.K. Aberrant lipid metabolism: An emerging diagnostic and therapeutic target in ovarian cancer. Int. J. Mol. Sci. 2013, 14, 7742–7756. [Google Scholar] [CrossRef]
Ahmed-Salim, Y.; Galazis, N.; Bracewell-Milnes, T.; Phelps, D.L.; Jones, B.P.; Chan, M.; Munoz-Gonzales, M.D.; Matsuzono, T.; Smith, J.R.; Yazbek, J.; et al. The application of metabolomics in ovarian cancer management: A systematic review. Int. J. Gynecol. Cancer 2021, 31, 754–774. [Google Scholar] [CrossRef]
Fan, L.; Zhang, W.; Yin, M.; Zhang, T.; Wu, X.; Zhang, H.; Sun, M.; Li, Z.; Hou, Y.; Zhou, X.; et al. Identification of metabolic biomarkers to diagnose epithelial ovarian cancer using a UPLC/QTOF/MS platform. Acta Oncol. 2012, 51, 473–479. [Google Scholar] [CrossRef]
Wörheide, M.A.; Krumsiek, J.; Kastenmüller, G.; Arnold, M. Multi-omics integration in biomedical research—A metabolomics-centric review. Anal. Chim. 2021, 1141, 144–162. [Google Scholar] [CrossRef]
Papoutsoglou, G.; Tarazona, S.; Lopes, M.B.; Klammsteiner, T.; Ibrahimi, E.; Eckenberger, J.; Novielli, P.; Tonda, A.; Simeon, A.; Shigdel, R.; et al. Machine learning approaches in microbiome research: Challenges and best practices. Front. Microbiol. 2023, 14, 1261889. [Google Scholar] [CrossRef]
Brix, F.; Demetrowitsch, T.; Jensen-Kroll, J.; Zacharias, H.U.; Szymczak, S.; Laudes, M.; Schreiber, S.; Schwarz, K. Evaluating the Effect of Data Merging and Postacquisition Normalization on Statistical Analysis of Untargeted High-Resolution Mass Spectrometry Based Urinary Metabolomics Data. Anal. Chem. 2024, 96, 33–40. [Google Scholar] [CrossRef]
Chua, A.E.; Pfeifer, L.D.; Sekera, E.R.; Hummon, A.B.; Desaire, H. Workflow for Evaluating Normalization Tools for Omics Data Using Supervised and Unsupervised Machine Learning. J. Am. Soc. Mass Spectrom. 2023, 34, 2775–2784. [Google Scholar] [CrossRef] [PubMed]
Tokareva, A.; Starodubtseva, N.; Frankevich, V.; Silachev, D. Minimizing Cohort Discrepancies: A Comparative Analysis of Data Normalization Approaches in Biomarker Research. Computation 2024, 12, 137. [Google Scholar] [CrossRef]
Tokareva, A.O.; Chagovets, V.V.; Kononikhin, A.S.; Starodubtseva, N.L.; Nikolaev, E.N.; Frankevich, V.E. Comparison of the effectiveness of variable selection method for creating a diagnostic panel of biomarkers for mass spectrometric lipidome analysis. J. Mass Spectrom. 2021, 56, e4702. [Google Scholar] [CrossRef] [PubMed]
Abd-Elnaby, M.; Alfonse, M.; Roushdy, M. Classification of breast cancer using microarray gene expression data: A survey. J. Biomed. Inform. 2021, 117, 103764. [Google Scholar] [CrossRef]
Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinform. 2022, 2, 927312. [Google Scholar] [CrossRef]
Wu, Z.; Chen, H.; Ke, S.; Mo, L.; Qiu, M.; Zhu, G.; Zhu, W.; Liu, L. Identifying potential biomarkers of idiopathic pulmonary fibrosis through machine learning analysis. Sci. Rep. 2023, 13, 16559. [Google Scholar] [CrossRef]
Tian, Y.; Tao, K.; Li, S.; Chen, X.; Wang, R.; Zhang, M.; Zhai, Z. Identification of m6A-Related Biomarkers in Systemic Lupus Erythematosus: A Bioinformation-Based Analysis. J. Inflamm. Res. 2024, 17, 507–526. [Google Scholar] [CrossRef]
Zhu, T.; Ma, Y.; Wang, J.; Xiong, W.; Mao, R.; Cui, B.; Min, Z.; Song, Y.; Chen, Z. Serum Metabolomics Reveals Metabolomic Profile and Potential Biomarkers in Asthma. Allergy Asthma Immunol. Res. 2024, 16, 235–252. [Google Scholar] [CrossRef]
Chardin, D.; Humbert, O.; Bailleux, C.; Burel-Vandenbos, F.; Rigau, V.; Pourcher, T.; Barlaud, M. Primal-dual for classification with rejection (PD-CR): A novel method for classification and feature selection—An application in metabolomics studies. BMC Bioinform. 2021, 22, 594. [Google Scholar] [CrossRef]
Zhou, D.; Zhu, W.; Sun, T.; Wang, Y.; Chi, Y.; Chen, T.; Lin, J. iMAP: A Web Server for Metabolomics Data Integrative Analysis. Front. Chem. 2021, 9, 659656. [Google Scholar] [CrossRef]
Alamro, H.; Thafar, M.A.; Albaradei, S.; Gojobori, T.; Essack, M.; Gao, X. Exploiting machine learning models to identify novel Alzheimer’s disease biomarkers and potential targets. Sci. Rep. 2023, 13, 4979. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Greenwood, P.; Klein, M.S. Deep learning for rapid identification of microbes using metabolomics profiles. Metabolites 2021, 11, 863. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Zhou, Z.; Dong, J.; Fu, Y.; Li, Y.; Luan, Z.; Peng, X. Predicting breast cancer 5-year survival using machine learning: A systematic review. PLoS ONE 2021, 16, e0250370. [Google Scholar] [CrossRef]
Chagovets, V.V.; Vasil’ev, V.G.; Iurova, M.V.; Khabas, G.N.; Pavlovich, S.V.; Starodubtseva, N.L.; Mayboroda, O.A. Metabolic “footprints” of the circulating cancer mucins: CA125 in the high-grade ovarian cancer. Bull. Russ. State Med. Univ. 2021, 6, 10–16. [Google Scholar] [CrossRef]
Pal, M.; Foody, G.M. Feature selection for classification of hyperspectral data by SVM. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2297–2307. [Google Scholar] [CrossRef]
Gonzalez-Covarrubias, V. Lipidomics in longevity and healthy aging. Biogerontology 2013, 14, 663–672. [Google Scholar] [CrossRef]
Naudin, S.; Sampson, J.N.; Moore, S.C.; Albanes, D.; Freedman, N.D.; Weinstein, S.J.; Stolzenberg-Solomon, R. Lipidomics and pancreatic cancer risk in two prospective studies. Eur. J. Epidemiol. 2023, 38, 783–793. [Google Scholar] [CrossRef]
Li, F.; Qin, X.; Chen, H.; Qiu, L.; Guo, Y.; Liu, H.; Chen, G.; Song, G.; Wang, X.; Li, F.; et al. Lipid profiling for early diagnosis and progression of colorectal cancer using direct-infusion electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry. Rapid Commun. Mass Spectrom. 2013, 27, 24–34. [Google Scholar] [CrossRef]
Trabert, B.; Hathaway, C.A.; Rice, M.S.; Rimm, E.B.; Sluss, P.M.; Terry, K.L.; Zeleznik, O.A.; Tworoger, S.S. Ovarian Cancer Risk in Relation to Blood Cholesterol and Triglycerides. Cancer Epidemiol. Biomark. Prev. A Publ. Am. Assoc. Cancer Res. Cosponsored Am. Soc. Prev. Oncol. 2021, 30, 2044–2051. [Google Scholar] [CrossRef]
Xu, C.; Zhou, D.; Luo, Y.; Guo, S.; Wang, T.; Liu, J.; Liu, Y.; Li, Z. Tissue and serum lipidome shows altered lipid composition with diagnostic potential in mycosis fungoides. Oncotarget 2017, 8, 48041–48050. [Google Scholar] [CrossRef]
Cotte, A.K.; Cottet, V.; Aires, V.; Mouillot, T.; Rizk, M.; Vinault, S.; Binquet, C.; De Barros, J.P.P.; Hillon, P.; Delmas, D. Phospholipid profiles and hepatocellular carcinoma risk and prognosis in cirrhotic patients. Oncotarget 2019, 10, 2161–2172. [Google Scholar] [CrossRef] [PubMed]
López, N.C.; García-Ordás, M.T.; Vitelli-Storelli, F.; Fernández-Navarro, P.; Palazuelos, C.; Alaiz-Rodríguez, R. Evaluation of feature selection techniques for breast cancer risk prediction. Int. J. Environ. Res. Public Health 2021, 18, 10670. [Google Scholar] [CrossRef] [PubMed]
Okser, S.; Pahikkala, T.; Aittokallio, T. Genetic variants and their interactions in disease risk prediction—Machine learning and network perspectives. BioData Min. 2013, 6, 5. [Google Scholar] [CrossRef] [PubMed]
Barbieri, M.C.; Grisci, B.I.; Dorn, M. Analysis and comparison of feature selection methods towards performance and stability. Expert Syst. Appl. 2024, 249, 123667. [Google Scholar] [CrossRef]
He, Z.; Yu, W. Stable feature selection for biomarker discovery. Comput. Biol. Chem. 2010, 34, 215–225. [Google Scholar] [CrossRef]
Mohamed, M.; Abdullah, A.; Zaki, A.M.; Rizk, F.H.; Eid, M.M.; El-Kenway, E.M. Advances and Challenges in Feature Selection Methods: A Comprehensive Review. J. Artif. Intell. Metaheuristics 2024, 7, 67–77. [Google Scholar] [CrossRef]
Talal, A.A. Abdullah; Mohd Soperi Mohd Zahid; Waleed Ali A Review of Interpretable ML in Healthcare: Taxonomy, Applications, Challenges, and Future Directions. Symmetry 2021, 13, 2439. [Google Scholar]
Harrison, C.J.; Sidey-Gibbons, C.J. Machine learning in medicine: A practical introduction to natural language processing. BMC Med. Res. Methodol. 2021, 21, 158. [Google Scholar] [CrossRef]
Ban, D.; Housley, S.N.; Matyunina, L.V.; McDonald, L.D.E.; Bae-Jump, V.L.; Benigno, B.B.; Skolnick, J.; McDonald, J.F. A personalized probabilistic approach to ovarian cancer diagnostics. Gynecol. Oncol. 2024, 182, 168–175. [Google Scholar] [CrossRef]
Wu, Z.; Zhu, M.; Kang, Y.; Leung, E.L.H.; Lei, T.; Shen, C.; Jiang, D.; Wang, Z.; Cao, D.; Hou, T. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief. Bioinform. 2021, 22, bbaa321. [Google Scholar] [CrossRef]
Farzaneh, F.; Salimnezhad, M.; Hosseini, M.S.; Ganjoei, T.A.; Arab, M.; Talayeh, M. D-dimer, Fibrinogen and Tumor Marker Levels in Patients with benign and Malignant Ovarian Tumorsneovascularization. Asian Pac. J. Cancer Prev. 2023, 24, 4263–4268. [Google Scholar] [CrossRef] [PubMed]
Hasenburg, A.; Eichkorn, D.; Vosshagen, F.; Obermayr, E.; Geroldinger, A.; Zeillinger, R.; Bossart, M. Biomarker-based early detection of epithelial ovarian cancer based on a five-protein signature in patient’s plasma—A prospective trial. BMC Cancer 2021, 21, 1037. [Google Scholar] [CrossRef] [PubMed]
Shan, D.; Cheng, S.; Ma, Y.; Peng, H. Serum levels of tumor markers and their clinical significance in epithelial ovarian cancer. J. Cent. South Univ. 2023, 48, 1039–1049. [Google Scholar] [CrossRef]
Periyasamy, A.; Gopisetty, G.; Subramanium, M.J.; Velusamy, S.; Rajkumar, T. Identification and validation of differential plasma proteins levels in epithelial ovarian cancer. J. Proteom. 2020, 226, 103893. [Google Scholar] [CrossRef]
Nazarizadeh, A.; Banirostam, T.; Biglari, T.; Kalantarhormozi, M.; Chichagi, F.; Behnoush, A.H.; Habibi, M.A.; Shahidi, R. Integrated neural network and evolutionary algorithm approach for liver fibrosis staging: Can artificial intelligence reduce patient costs? JGH Open 2024, 8, e13075. [Google Scholar] [CrossRef]
Qaderi, K.; Sharifipour, F.; Dabir, M.; Shams, R.; Behmanesh, A. Artificial intelligence (AI) approaches to male infertility in IVF: A mapping review. Eur. J. Med. Res. 2025, 30, 246. [Google Scholar] [CrossRef]
Nahar, A.; Paul, S.; Saikia, M.J. A systematic review on machine learning approaches in cerebral palsy research. PeerJ 2024, 12, e18270. [Google Scholar] [CrossRef]
Smiley, A.; Villarreal-Zegarra, D.; Reategui-Rivera, C.M.; Escobar-Agreda, S.; Finkelstein, J. Methodological and reporting quality of machine learning studies on cancer diagnosis, treatment, and prognosis. Front. Oncol. 2025, 15, 1555247. [Google Scholar] [CrossRef]
Gómez-Pascual, A.; Naccache, T.; Xu, J.; Hooshmand, K.; Wretlind, A.; Gabrielli, M.; Lombardo, M.T.; Shi, L.; Buckley, N.J.; Tijms, B.M.; et al. Paired plasma lipidomics and proteomics analysis in the conversion from mild cognitive impairment to Alzheimer’s disease. Comput. Biol. Med. 2024, 176, 108588. [Google Scholar] [CrossRef]
Wang, K.; Theeke, L.A.; Liao, C.; Wang, N.; Lu, Y.; Xiao, D.; Xu, C. Deep learning analysis of UPLC-MS/MS-based metabolomics data to predict Alzheimer’s disease. J. Neurol. Sci. 2023, 453, 120812. [Google Scholar] [CrossRef]
Zhang, T.H.; Hasib, M.M.; Chiu, Y.C.; Han, Z.F.; Jin, Y.F.; Flores, M.; Chen, Y.; Huang, Y. Transformer for Gene Expression Modeling (T-GEM): An Interpretable Deep Learning Model for Gene Expression-Based Phenotype Predictions. Cancers 2022, 14, 4763. [Google Scholar] [CrossRef] [PubMed]
Kalkan, H.; Akkaya, U.M.; Inal-Gültekin, G.; Sanchez-Perez, A.M. Prediction of Alzheimer’s Disease by a Novel Image-Based Representation of Gene Expression. Genes 2022, 13, 1406. [Google Scholar] [CrossRef] [PubMed]
El-Melegy, M.; Mamdouh, A.; Ali, S.; Badawy, M.; El-Ghar, M.A.; Alghamdi, N.S.; El-Baz, A. Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning. Bioengineering 2024, 11, 635. [Google Scholar] [CrossRef] [PubMed]
Karim, A.; Su, Z.; West, P.K.; Keon, M.; The NYGC ALS Consortium; Shamsani, J.; Brennan, S.; Wong, T.; Milicevic, O.; Teunisse, G.; et al. Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values. Genes 2021, 12, 1754. [Google Scholar] [CrossRef]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Ng, W.; Minasny, B.; de Sousa Mendes, W.; Demattê, J.A.M. The influence of training sample size on the accuracy of deep learning models for the prediction of soil properties with near-infrared spectroscopy data. SOIL 2020, 6, 565–578. [Google Scholar] [CrossRef]
Yilmaz, E.O.; Kavzoglu, T. Analysis of the effect of training sample size on the performance of 2D CNN models. Intercont. Geoinf. Days 2021, 2, 241–244. [Google Scholar]
Kim, D.; Seo, S.B.; Yoo, N.H.; Shin, G. A Study on Sample Size Sensitivity of Factory Manufacturing Dataset for CNN-Based Defective Product Classification. Computation 2022, 10, 142. [Google Scholar] [CrossRef]
Alenizy, H.A.; Berri, J. Transforming tabular data into images via enhanced spatial relationships for CNN processing. Sci. Rep. 2025, 15, 17004. [Google Scholar] [CrossRef]
Elmannai, H.; El-Rashidy, N.; Mashal, I.; Alohali, M.A.; Farag, S.; El-Sappagh, S.; Saleh, H. Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence. Diagnostics 2023, 13, 1056. [Google Scholar] [CrossRef]
Sah, S.; Bifarin, O.O.; Moore, S.G.; Gaul, D.A.; Chung, H.; Kwon, S.Y.; Cho, H.; Cho, C.H.; Kim, J.H.; Kim, J.; et al. Serum Lipidome Profiling Reveals a Distinct Signature of Ovarian Cancer in Korean Women. Cancer Epidemiol. Biomark. Prev. 2024, 33, 681–693. [Google Scholar] [CrossRef] [PubMed]
Tanioka, S.; Ishida, F.; Nakano, F.; Kawakita, F.; Kanamaru, H.; Nakatsuka, Y.; Nishikawa, H.; Suzuki, H. Machine Learning Analysis of Matricellular Proteins and Clinical Variables for Early Prediction of Delayed Cerebral Ischemia After Aneurysmal Subarachnoid Hemorrhage. Mol. Neurobiol. 2019, 56, 7128–7135. [Google Scholar] [CrossRef] [PubMed]
Belsti, Y.; Moran, L.; Du, L.; Mousa, A.; De Silva, K.; Enticott, J.; Teede, H. Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model. Int. J. Med. Inform. 2023, 179, 105228. [Google Scholar] [CrossRef] [PubMed]
Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Lect. Notes Comput. Sci. (Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 2009, 5476 LNAI, 475–482. [Google Scholar] [CrossRef]
Kivrak, M.; Avci, U.; Uzun, H.; Ardic, C. The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients. Diagnostics 2024, 14, 2634. [Google Scholar] [CrossRef]
Ramezankhani, A.; Pournik, O.; Shahrabi, J.; Azizi, F.; Hadaegh, F.; Khalili, D. The impact of oversampling with SMOTE on the performance of 3 classifiers in prediction of type 2 diabetes. Med. Decis. Mak. 2016, 36, 137–144. [Google Scholar] [CrossRef]
Hassanzadeh, R.; Farhadian, M.; Rafieemehr, H. Hospital mortality prediction in traumatic injuries patients: Comparing different SMOTE-based machine learning algorithms. BMC Med. Res. Methodol. 2023, 23, 101. [Google Scholar] [CrossRef]
Mohseni-Takalloo, S.; Mohseni, H.; Mozaffari-Khosravi, H.; Mirzaei, M.; Hosseinzadeh, M. The effect of data balancing approaches on the prediction of metabolic syndrome using non-invasive parameters based on random forest. BMC Bioinform. 2024, 25, 18. [Google Scholar] [CrossRef]
Blagus, R.; Lusa, L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 2013, 14, 106. [Google Scholar] [CrossRef]
Welvaars, K.; Oosterhoff, J.H.F.; van den Bekerom, M.P.J.; Doornberg, J.N.; van Haarst, E.P.; van der Zee, J.A.; van Andel, G.A.; Lagerveld, B.W.; Hovius, M.C.; Kauer, P.C.; et al. Implications of resampling data to address the class imbalance problem (IRCIP): An evaluation of impact on performance between classification algorithms in medical data. JAMIA Open 2023, 6, ooad033. [Google Scholar] [CrossRef]
Starodubtseva, N.L.; Tokareva, A.O.; Rodionov, V.V.; Brzhozovskiy, A.G.; Bugrova, A.E.; Chagovets, V.V.; Kometova, V.V.; Kukaev, E.N.; Soares, N.C.; Kovalev, G.I.; et al. Integrating Proteomics and Lipidomics for Evaluating the Risk of Breast Cancer Progression: A Pilot Study. Biomedicines 2023, 11, 1786. [Google Scholar] [CrossRef] [PubMed]
Tonoyan, N.M.; Chagovets, V.V.; Starodubtseva, N.L.; Tokareva, A.O.; Chingin, K.; Kozachenko, I.F.; Adamyan, L.V.; Frankevich, V.E. Alterations in lipid profile upon uterine fibroids and its recurrence. Sci. Rep. 2021, 11, 11447. [Google Scholar] [CrossRef] [PubMed]
Koelmel, J.P.; Kroeger, N.M.; Ulmer, C.Z.; Bowden, J.A.; Patterson, R.E.; Cochran, J.A.; Beecher, C.W.W.; Garrett, T.J.; Yost, R.A. LipidMatch: An automated workflow for rule-based lipid identification using untargeted high-resolution tandem mass spectrometry data. BMC Bioinform. 2017, 18, 331. [Google Scholar] [CrossRef] [PubMed]
Tokareva, A.O.; Chagovets, V.V.; Kononikhin, A.S.; Starodubtseva, N.L.; Nikolaev, E.N.; Frankevich, V.E. Normalization methods for reducing interbatch effect without quality control samples in liquid chromatography-mass spectrometry-based studies. Anal. Bioanal. Chem. 2021, 413, 3479–3486. [Google Scholar] [CrossRef]
Sud, M.; Fahy, E.; Cotter, D.; Brown, A.; Dennis, E.A.; Glass, C.K.; Merrill, A.H.; Murphy, R.C.; Raetz, C.R.H.; Russell, D.W.; et al. LMSD: LIPID MAPS structure database. Nucleic Acids Res. 2007, 35, 527–532. [Google Scholar] [CrossRef]
Galindo-Prieto, B.; Eriksson, L.; Trygg, J. Variable influence on projection (VIP) for orthogonal projections to latent structures (OPLS). J. Chemom. 2014, 28, 623–632. [Google Scholar] [CrossRef]
Menze, B.H.; Kelm, B.M.; Masuch, R.; Himmelreich, U.; Bachert, P.; Petrich, W.; Hamprecht, F.A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform. 2009, 10, 213. [Google Scholar] [CrossRef]
Guyon, I.; Weston, J.; Barhill, S. Gene selection for cancer classification using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Kursa, M.B.; Rudnicki, W.R. Feature selection with the boruta package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
Koch, L.F. Index of Biotal Dispersity. Ecology 1957, 38, 145–148. [Google Scholar] [CrossRef]
Hubert, L.J.; Levin, J.R. A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 1976, 83, 1072–1080. [Google Scholar] [CrossRef]
Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
Caliñski, T.; Harabasz, J. A Dendrite Method Foe Cluster Analysis. Commun. Stat. 1974, 3, 1–27. [Google Scholar] [CrossRef]
Sharma, A.; Vans, E.; Shigemizu, D.; Boroevich, K.A.; Tsunoda, T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 2019, 9, 11399. [Google Scholar] [CrossRef]
Clerc, M.; Kennedy, J. The Particle Swarm—Explosion, Stability, and Convergence in a Multidimensional Complex Space. Mutat. Res. DNAging 2002, 6, 58–73. [Google Scholar] [CrossRef]
Thévenot, E.A.; Roux, A.; Xu, Y.; Ezan, E.; Junot, C. Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses. J. Proteome Res. 2015, 14, 3322–3335. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Meyer, D. Support Vector Machines. The Interface to libsvm in package. R News 2024, 8, e107. [Google Scholar]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Volume 13–17, pp. 785–794. [Google Scholar] [CrossRef]
Dudek, A.; Walesiak, M. The Choice of Variable Normalization Method in Cluster Analysis. Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges. In Proceedings of the 35th International Business Information Management Association Conference (IBIMA), Seville, Spain, 1–2 April 2020; pp. 325–340. [Google Scholar]
Siriseriwan, W. A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE 2024. Available online: https://reddertar.r-universe.dev/smotefamily (accessed on 7 July 2025).
Wild, F. Latent Semantic Analysis 2022. Available online: https://cran.r-project.org/web/packages/lsa/index.html (accessed on 7 July 2025).
Donaldson, J. T-Distributed Stochastic Neighbor Embedding for R (t-SNE) 2022. Available online: https://CRAN.R-project.org/package=tsne (accessed on 7 July 2025).
Barber, C.B. Convex Hull in Arbitrary Dimension. 2018. Available online: https://cran.r-project.org/src/contrib/Archive/cxhull/ (accessed on 7 July 2025).
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
Kalinowski, T.; Falbe, D.; Allaire, J.; Chollet, F.; RStudio; Google; Tang, Y.; Van Der Bijl, W.; Studer, M.; Keydana, S. R Interface to “Keras”. 2024. Available online: https://keras3.posit.co/index.html (accessed on 7 July 2025).

Figure 1. Comparative analysis (Venn diagrams) of biomarker sets identified by Mann–Whitney U test (blue) and SVM-RFE feature selection (yellow) methods in binary classification tasks: (A) Malignant versus benign tumor discrimination. (B) Benign tumor versus healthy control differentiation. (C) Malignant versus control classification.

Figure 2. Comparative biomarker discovery in OC/benign tumor/control classification using multivariate feature selection approaches: (A) Kruskall–Wallis and PLS-DA; (B) Kruskall–Wallis, Mann–Whitney, PLS-DA and SVM-RFE.

Figure 3. Performance evaluation (accuracy and mean recall) of machine learning models using different feature selection strategies: SVM-RFE binary feature selection (A,B), and Mann–Whitney feature selection (C,D) for binary classification and multiclass classification (E,F). BvM—benign versus malignant tumor separation, CvB—control versus benign group separation, CvM—control versus malignant group separation, OvO—one-versus-one classification.

Figure 4. Receiver operating characteristic (ROC) analysis of optimal binary classification model for OC detection. AUC—area under the curve. Green color indicates XGBoost with SVM-RFE-selected features for control versus benign group separation; black represents RF with SVM-RFE features for control versus malignant group separation; blue corresponds to CNN with Mann–Whitney-selected features for benign versus malignant tumor separation; and red denotes XGBoost with Mann–Whitney features for control versus malignant group separation.

Figure 5. (A) Pipeline of feature selection methods. Selection_B—methods for binary class selection features, selection_M—methods for multiply feature selection. (B) Pipeline of binary classification methods. PSO—methods with particle swarm optimization tuning. (C) Pipeline of multiply classification methods. PSO—methods with particle swarm optimization tuning.

Table 1. Clinical characteristics of study participants.

Variable	OC (n = 103)	Benign Tumor (n = 107)	Control Group (n = 19)	p Value (Kruskal–Wallis H Test)
Age, years, Median (Q1;Q3)	51.0 (39.0;60.0)	38.0 (34.0;45.0)	39.5 (34.0;60.3)	<0.001
BMI (kg/m²), Median (Q1;Q3)	25.0 (22.0;27.8)	23.5 (21.0;27.0)	22.5 (20.8;25.3)	0.27
Benign ovarian tumors, n (%)	-	cystadenoma—30 (28%) endometrioid cyst—56 (52%) mature teratoma—21(20%)	-	-
Borderline tumors, n (%)	28 (27%)	-	-	-
Low-grade OC, n (%)	16 (16%)	-	-	-
FIGO stage (high-grade OC), n (%)	IA—5( 4.9%) IIB—5 (4.9%) IIC—1 (1.0%) IIIA—4 (3.9%) IIIC—40 (39%) IVA—4 (3.9%)	-	-

Table 2. Comparative stability analysis of feature selection methods using Koch’s biotic diversity index for binary classification tasks. OPLS-DA—Orthogonal Projection on Latent Structures-Discriminant Analysis; RF—Random Forest; SVM-RFE—Support Vector Machine-Recursive Feature Elimination; LASSO—least absolute shrinkage and selection operator.

Method	Benign vs. Malignant	Control vs. Benign	Control vs. Malignant	Mean (Score)
Mann–Whitney	0.41	0.31	0.47	0.40 (5)
Welch	0.35	0.27	0.35	0.32 (4)
OPLS-DA	0.50	0.51	0.57	0.53 (6)
RF	0.19	0.20	0.16	0.18 (3)
SVM-RFE	0.94	0.59	0.71	0.75 (7)
LASSO	0.20	0.14	0.06	0.14 (1)
Boruta	0.17	0.14	0.12	0.14 (2)

Table 3. Cluster separation metrics in PC space for clinical groups using selected features.

Metric	Method	Benign vs. Malignant	Control vs. Benign	Control vs. Malignant	Combined Feature Set	Mean (Score)
Hubert–Levin’s C index	Mann–Whitney	0.46	0.42	0.45	0.44	0.44 (7)
	Welch	0.46	0.42	0.46	0.44	0.44 (6)
	OPLS-DA	0.45	0.46	0.47	0.47	0.46 (4)
	RF	100.00	100.00	100.00	0.47	75.12 (1)
	SVM-RFE	0.47	0.44	0.46	0.45	0.45 (5)
	LASSO	0.46	100.00	100.00	0.46	50.23 (2)
	Boruta	0.44	100.00	100.00	0.44	50.22 (3)
Davies–Bouldin’s index	Mann–Whitney	3.35	4.09	3.87	3.95	3.81 (7)
	Welch	3.33	4.37	4.46	4.49	4.16 (6)
	OPLS-DA	4.76	16.44	15.23	13.39	12.45 (4)
	RF	100.00	100.00	100.00	7.55	76.89 (1)
	SVM-RFE	10.75	7.43	7.66	8.77	8.65 (5)
	LASSO	2.80	100.00	100.00	2.80	51.40 (3)
	Boruta	2.72	100.00	100.00	2.72	51.36 (2)
Calinski–Harabasz pseudo-F statistic	Mann–Whitney	−12.79	47.35	−4.53	1.53	7.89 (4)
	Welch	−13.42	45.04	1.10	7.99	10.18 (5)
	OPLS-DA	4.25	30.06	24.15	23.39	20.46 (7)
	RF	−100.00	−100.00	−100.00	15.92	−71.02 (1)
	SVM-RFE	11.66	11.86	20.36	10.89	13.69 (6)
	LASSO	−14.29	−100.00	−100.00	−14.29	−57.14 (2)
	Boruta	8.82	−100.00	−100.00	8.82	−45.59 (3)

Table 4. Robustness and discriminatory power assessment of selected biomarker panels in multiclass OC classification.

Method	Koch’s Index (Score)	Hubert–Levin’s C Index (Score)	Davies–Bouldin’s Index (Score)	Calinski–Harabasz Pseudo-F Statistic (Score)	Sum Score (Rank)
Kruskall–Wallis	0.47 (5)	0.44 (4)	3.95 (5)	1.37 (4)	18 (1)
PLS-DA	0.46 (4)	0.42 (5)	7.69 (4)	15.19 (5)	18 (1)
RF	0.18 (1)	100.00 (3)	100.00 (3)	−100.00 (3)	10 (2)
LASSO	0.21 (3)	100.00 (2)	100.00 (2)	−100.00 (2)	9 (3)
Boruta	0.18 (2)	100.00 (1)	100.00 (1)	−100.00 (1)	5 (4)

Table 5. Prognostic performance of the best combinations of classification model and feature selection method across clinical groups.

Model, Feature Selection Method	Predicted Outcome	Clinical Group
Model, Feature Selection Method	Predicted Outcome	Control (n = 20)	Benign (n = 32)	Malignant (n = 30)
XGBoost, Kruskal–Wallis set	control	18 (90%)	3	2
	benign	2	27 (84%)	10
	malignant	0	2	18 (60%)
OvO CNN, Mann–Whitney set	control	17 (85%)	5	3
	benign	0	20 (63%)	1
	malignant	3	7	26 (87%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tokareva, A.; Iurova, M.; Starodubtseva, N.; Chagovets, V.; Novoselova, A.; Kukaev, E.; Frankevich, V.; Sukhikh, G. Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics. Int. J. Mol. Sci. 2025, 26, 6630. https://doi.org/10.3390/ijms26146630

AMA Style

Tokareva A, Iurova M, Starodubtseva N, Chagovets V, Novoselova A, Kukaev E, Frankevich V, Sukhikh G. Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics. International Journal of Molecular Sciences. 2025; 26(14):6630. https://doi.org/10.3390/ijms26146630

Chicago/Turabian Style

Tokareva, Alisa, Mariia Iurova, Natalia Starodubtseva, Vitaliy Chagovets, Anastasia Novoselova, Evgenii Kukaev, Vladimir Frankevich, and Gennady Sukhikh. 2025. "Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics" International Journal of Molecular Sciences 26, no. 14: 6630. https://doi.org/10.3390/ijms26146630

APA Style

Tokareva, A., Iurova, M., Starodubtseva, N., Chagovets, V., Novoselova, A., Kukaev, E., Frankevich, V., & Sukhikh, G. (2025). Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics. International Journal of Molecular Sciences, 26(14), 6630. https://doi.org/10.3390/ijms26146630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics

Abstract

1. Introduction

2. Results

2.1. Clinical Characteristics of Study Participants

2.2. Plasma Lipidome/Metabolome Data

2.3. Feature Selection

2.3.1. Comparative Performance and Stability of Feature Selection Methods in Binary Classification

2.3.2. Comparative Performance and Stability of Feature Selection Methods in Multiclass Analysis

2.4. Machine Learning Models in Ovarian Tumor Classification

3. Discussion

4. Materials and Methods

4.1. Study Design

4.2. Lipidomic Analysis of Blood Plasma Samples (HPLC-MS)

4.3. Metabolomic Analysis by NMR Spectroscopy

4.4. Feature Selection and Stability Analysis

4.5. Classification Model Selection

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI