Next Article in Journal
Copper(II) Complexes with 4-Substituted 2,6-Bis(thiazol-2-yl)pyridines—An Overview of Structural–Optical Relationships
Previous Article in Journal
Gene Mapping and Genetic Analysis of Maize Resistance to Stalk Rot
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Drug Repurposing in Glioblastoma Using a Machine Learning-Based Hybrid Feature Selection Approach

Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(24), 11871; https://doi.org/10.3390/ijms262411871
Submission received: 13 November 2025 / Revised: 5 December 2025 / Accepted: 7 December 2025 / Published: 9 December 2025
(This article belongs to the Section Molecular Informatics)

Abstract

Glioblastoma (GBM) is a fatal and aggressive form of brain cancer, described by rapid progression, poor prognosis, and limited treatment options. This study aims to apply a hybrid of two popular feature selection methods for the categorization of drug sensitivity features in GBM versus other cancer cell lines by employing a rank-based weighting combination scheme to identify discriminative drug compound feature sets. This approach is necessary to reduce dimensionality and enhance classification performance while increasing the interpretability of the prediction model. The experimental results indicate that the utilized machine learning (ML)-driven feature selection approach achieves more than 95% accuracy value and obtains less than or equal to 11 selected features for each drug sensitivity metric on Genomics of Drug Sensitivity in Cancer (GDSC) datasets with high-dimensional space. Our drug compound-based findings demonstrate that our feature selection approach improves model stability and performance, paving the way for more precise and clinically actionable advancements in GBM research.

1. Introduction

Glioblastoma (GBM) is the most aggressive and devastating primary cancer of the central nervous system, with a poor prognosis and limited treatment options [1]. Globally, brain and other CNS cancers are responsible for around 330,000 incident cases and 227,000 deaths per year [2]. GBM is the most commonly occurring malignant brain tumor in adults, representing 14.6% of all brain and CNS tumors and 48.3% of malignant tumors in US registry data [3]. The population-based incidence of GBM in the US is 3.19 cases per 100,000 people, with a median age of 64 years and higher rates in males [4]. A systematic review in the US shows that GBM patients need to pay very expensive direct costs (i.e., USD 400–430 k per patient [5]). For newly diagnosed GBM, the standard treatment approach involves maximal surgical resection, followed by concurrent chemotherapy (CRT), followed by adjuvant temozolomide (TMZ) [6,7,8,9]. On average, patients with a diagnosis of GBM survive 14 to 15 months following their diagnosis [10]. Although significant progress has been made in molecular classification and proteogenomic characterization [11,12,13,14,15], novel approaches [16,17] have not resulted in improved outcomes, and standard of care (SOC) management has remained unchanged in over 20 years [6]. GBM tumors frequently recur secondary to tumor resistance [18]. Several critical hallmarks of cancer pathways have been implicated in driving GBM resistance with crossover into all modalities of cell kill, including radiation and chemotherapy that comprise the base of SOC [19,20]. Drug resistance in GBM has remained a critical barrier, and utilization of existing therapeutic agents with transferability to GBM has been limited by both target specificity and blood–brain barrier penetration limitations. Given ongoing pressure to optimize management while maximizing speed, time, cost, and accuracy, there is increased focus on drug repurposing, particularly to leverage drugs that are already Food and Drug Administration (FDA) approved for use in humans [21,22,23]. There are currently no biomarkers for GBM to direct management based on resistance patterns, and GBMs are highly heterogeneous tumors [24]. To identify potential drug sensitivity in GBM, cell lines, gene expression profiles, and PDX models have been utilized to probe the possibility of response [25,26]. Modern nanoscale imaging, particularly using atomic force microscopy and other scanning-probe methods, can also provide more than just high-resolution structural snapshots: they enable quantitative, single-cell measurement of mechanical and biophysical parameters such as stiffness and adhesion [27]. This means that atomic force microscopy-based nanomechanical profiling can reveal biophysical “signatures” distinguishing healthy glial cells from GBM cells. Different GBM cell lines can also show significant differences in nanomechanical and viscoelastic properties, cytoskeletal organization, and migration behavior [28]. These properties can serve as a nanomechanical signature or biomarkers of GBM cell heterogeneity, for aggressiveness, and potentially disease progression [28]. While several datasets can be employed for drug sensitivity work, consistent labels and standardized dose response curves, as well as strong connections to matched genomic, target, and pathways assignment, are key to arrive at clinically actionable results [29]. Drug sensitivity datasets are, as a result, composed of a large number of variables (i.e., features), including several that can potentially define the sensitivity to a drug, hence carry wide dimensionality which varies across datasets. In addition, datasets can have significant amounts of missing data. Feature selection is a critical process for dimensionality reduction to enhance data visualization and interpretation, minimize computational and storage requirements, accelerate learning model training processes, and overcome the curse of dimensionality to boost predictive model performance [30,31]. With the development and advances in technology and artificial intelligence (AI)-based methods, such as machine learning models, feature selection can be carried out efficiently in complex biomedical datasets wherein patient or cell line management is varied to optimize speed, time, cost, and accuracy. By employing robust evaluation metrics and leveraging filter, wrapper, or embedded techniques, the selection of specific signals related to defined class labels, e.g., drug compounds in patients with GBM, can result in robust biomarker candidates. In this study, we introduce a fused machine learning-based feature selection and weighting method that finds significant drug compound candidates by discriminating GBM signals from other cancer cell lines for Genomics of Drug Sensitivity in Cancer (GDSC) datasets.
The key contributions of this research, categorized into technical and clinical aspects, are summarized as follows:
Technical aspects
  • To the best of our knowledge, this is the first study that employs a combined feature selection and weighting methodology to select drug compounds by discriminating GBM cell lines versus other cell lines by employing four drug sensitivity metrics separately, namely, LN_IC50 (Half Maximal Inhibitory Concentration), AUC (Area Under the Curve), RMSE (Root Mean Squared Error), and Z_SCORE (Standard score).
  • We assessed the effects of varying feature selection and weighting task weights on the classification model’s performance.
  • To improve the impact of reliability and fairness of performance estimates under imbalanced class distribution in the GDSC1 and GDSC2 datasets, we employed stratified five-fold cross-validation, maintaining the original class proportions within each data fold.
  • We conducted comprehensive experiments for the evaluation of feature selection and weighting across the ensemble machine learning classification model, namely, Random Forest (RF), to identify the optimal drug compound set among all combination sets and the minimal feature subset necessary for precise classification.
  • Feature weighting is a crucial stage in determining which features will be used at the final decision point among the various feature names that emerge during the cross-validation process, and in addressing this problem.
  • To observe the impact of the selected feature set of GBM cell lines on model prediction performance for each dataset and drug sensitivity metric, we plotted SHAP (SHapley Additive exPlanations)-based plots.
Clinical aspects
  • We leveraged GDSC datasets, employing feature selection techniques to identify signals that differentiate GBM cell lines vs. other cell lines. This is a novel finding, as drug compound-based GBM cell line profiles with machine learning (ML)-based methodology have not been previously characterized.
  • We discussed the potential interpretation of the identified critical features in association with GBM.
The rest of this paper is organized as follows: Section 2 presents a comprehensive overview of the experimental process, performance metrics, and computational results. Section 3 discusses the findings and their implications for drug repurposing. Section 4 describes the dataset employed for feature selection and weighting methodology. Lastly, Section 5 summarizes the conclusions of our research and explores potential avenues for future research.

2. Results

In this section, the experimental process and the performance metric employed for feature selection and classification are described. Then, the comprehensive computational results are provided according to each different combination of the scheme, datasets, and metrics.

2.1. Experimental Process

To perform the proposed methodology, we employed Python’s (version 3.9) scikit-learn library (version 1.6.1) for implementing the machine learning model and the mRMR (Minimum Redundancy–Maximum Relevance) package version 0.2.8 for the feature selection process. For SHAP analysis, we also utilized the SHAP package version 0.48.0. All experiments were conducted on a macOS Sequoia 15.6.1 MacBook Pro, equipped with a 16-core Apple M3 Max processor and 128 GB of LPDDR5 memory. Feature selection and weighting methodology are employed to identify the compounds that discriminate GBM cell lines, then the GDSC datasets are transformed into a pivot table organized by cell line and cancers (i.e., GBM vs. others) with all potential drug compounds for each drug sensitivity metric. We used the mean aggregation function for this process. Then, we applied the iterative imputation method [32] for the missing data imputation process for each metric and dataset. As all features include missing values in both datasets, we utilized the multivariate and powerful missing data imputation approach that estimates each feature value from all the others. This is why we do not remove missing feature values in this study. We chose full dataset imputation to maintain consistency across large-scale experiments and avoid fold-specific variability, and provide computational efficiency for this study. Although this approach may introduce limited information leakage, it maintains robustness within the biomedical setting. Each feature showed a varying level of missingness. The pattern appeared largely random, with no evident links to specific sample features. For example, Figure 1 illustrates the missing value statistics for the GDSC1 and GDSC2 datasets, based on the AUC metric, the top 20 related missing features, and the corresponding missing ratios.
To ensure optimal results, we employed one of the most effective machine learning models, Random Forest (RF). The predictive model was utilized in both the feature weighting-based selection and classification stages. In this study, we applied stratified five-fold cross-validation to obtain performance results, provide consistent data, and improve representation by reducing the bias in training and evaluation. To preserve uniformity in dataset results, we fixed the random state at 0 for the Random Forest model.

2.2. Performance Metrics

To evaluate the effectiveness of our utilized hybrid feature selection and weighting method on drug compound-based cancer cell line datasets, we considered the classification accuracy rate in this research. This metric serves as a direct indicator of the predictive or discriminative ability of the feature selection process to correctly categorize samples into GBM or other cancer cell lines.
Classification accuracy (ACC) is defined as the ratio of correctly classified samples to the total dataset samples. In other words, accuracy is computed by adding the number of true positive and true negative predictions and then dividing this sum by the total number of true positives, false negatives, true negatives, and false positives [33], as defined in Equation (1).
A C C = TP + TN TP + TN + FP + FN
where TP, FN, TN, and FP represent the number of true positives, false negatives, true negatives, and false positives, respectively. We aimed to find the minimum number of selected features with the highest accuracy rate among all combination schemes, including rank-based feature weights for each dataset and drug sensitivity metric.

2.3. Computational Results

This subsection presents the effect of our feature selection and weighting approach on the performance of the machine learning model employed (i.e., RF) for different drug sensitivity metrics-based GDSC datasets.

2.3.1. The Effects of Feature Selection and Weighting Method on Classification Model Performance for the Categorization of GBM Cell Lines Versus Other Cell Lines on GDSC Datasets

We assessed the performance of LASSO (Least Absolute Shrinkage and Selection Operator) and mRMR-based feature selection methods using rank-based weighting schemes (with weights of 1 and 2) using stratified five-fold cross-validation. The computational results of these experiments, detailing the weight count (‘k’), and the number (#) of selected features, are fully tabulated in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. The changes in color, from red to green, in the tables represent the lowest (red) accuracy rate values to the highest accuracy rate values (green).

2.3.2. GDSC1 Dataset Results

The first analysis used the area under the curve (AUC)-based drug data from the GDSC1 dataset, the highest (best) accuracy value with the minimum number of selected features is achieved by choosing 10 features, assigning LASSO = 1 and mRMR = 2 rank-based weights, a total weight value of 6, Random Forest (RF) model with a 96.592% accuracy rate (ACC) (Table 1). In this study, we selected the RF model as it provides more efficient results than other prediction models. If we keep the total weight value around the highest, the number of selected features will be the lowest (e.g., selected features will be most discriminative and informative; however, some information loss can be possible in this case), as this operation chooses only features selected by both feature selection methods for each fold of the cross-validation.
Table 1. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the AUC-based GDSC1 dataset.
Table 1. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the AUC-based GDSC1 dataset.
LASSO = 1 and mRMR = 2LASSO = 2 and mRMR = 1
k# of FeaturesRFk# of FeaturesRF
113296.592113296.592
29196.592212096.592
36896.59237896.592
45296.59247496.592
53696.59256396.592
61096.59266096.592
7696.59174596.592
8396.48884496.592
9396.48893196.592
10396.488103096.592
11396.48811596.487
12296.48812396.488
13296.48813396.488
14194.31914296.488
The second analysis used the half maximal inhibitory concentration-based (i.e., LN_IC50) drug data from the GDSC1 dataset; the highest (best) accuracy value with the minimum number of selected features is achieved by choosing 11 features, assigning LASSO = 2 and mRMR = 1 rank-based weights, a total weight value of 11, with a 96.798% ACC (Table 2). Half maximal inhibitory concentration represents the concentration of a drug needed to inhibit 50% of the biological activity. Greater drug potency leads to lower IC50 values [34]. We can say that ACC values are generally equal to or higher than 96.592% due to the dataset imbalance characteristics. Furthermore, if we use only one most significant and discriminative feature (i.e., drug compound), it will provide a 93.594% ACC value with the RF model. In addition to using only one feature, when only two features are used, our feature selection and weighting methodology provide a 96.180% ACC value (Table 2).
Table 2. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the LN_IC50-based GDSC1 dataset.
Table 2. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the LN_IC50-based GDSC1 dataset.
LASSO = 1 and mRMR = 2LASSO = 2 and mRMR = 1
k# of FeaturesRFk# of FeaturesRF
115596.592115596.592
29496.592215196.592
37196.59239096.592
45496.59249096.592
53896.59257196.592
61596.79866896.592
71496.69575196.592
81096.48884996.592
9796.59193696.695
10696.591103596.592
11596.488111196.798
12396.38512596.385
13396.38513496.488
14193.59414296.180
For the Root Mean Squared Error-based (i.e., RMSE) drug compounds regarding the GDSC1 dataset, the highest (best) accuracy value with the minimum number of selected features is obtained with 3 selected features by assigning LASSO = 1 and mRMR = 2, or LASSO = 2 and mRMR = 1 with a 96.798% ACC using the RF model (Table 3). If the number of selected features is more than one, this methodology obtains more than a 96% ACC for this type of dataset.
Table 3. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the RMSE-based GDSC1 dataset.
Table 3. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the RMSE-based GDSC1 dataset.
LASSO = 1 and mRMR = 2LASSO = 2 and mRMR = 1
k# of FeaturesRFk# of FeaturesRF
114196.592114196.592
29596.592213696.592
36896.59239096.592
44796.59248796.592
53296.59256596.592
61696.69566496.592
71496.69574696.592
81196.59284696.592
91196.59293296.592
10596.385103096.592
11496.488111496.592
12396.798121096.695
13396.79813496.488
14193.90614396.798
For the Z_SCORE-based drug compounds regarding the GDSC1 dataset, the highest (best) accuracy value with the minimum number of selected features is obtained with 9 selected features, assigning LASSO = 1 and mRMR = 2 rank-based weights, a total weight value of 9, and a 96.799% ACC (Table 4).
Table 4. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the Z_SCORE-based GDSC1 dataset.
Table 4. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the Z_SCORE-based GDSC1 dataset.
LASSO = 1 and mRMR = 2LASSO = 2 and mRMR = 1
k# of FeaturesRFk# of FeaturesRF
117896.592117896.592
212096.592217596.592
39396.592311796.592
46196.592411496.592
53996.69559096.592
61496.69568896.592
71396.59275896.592
81196.48885696.592
9996.79993696.695
10496.696103596.695
11496.696111096.798
12396.17912896.695
13396.17913496.696
14194.11114396.179

2.3.3. GDSC2 Dataset Results

When the GDSC2 dataset is used with our feature selection and weighting method, the computational results are shown in Table 5, Table 6, Table 7 and Table 8. For the AUC-based drug compounds regarding the GDSC2 dataset, the highest (best) accuracy value with the minimum number of selected features is achieved by choosing 6 features, assigning LASSO = 1 and mRMR = 2 rank-based weights, a total weight value of 7, Random Forest (RF) model with a 96.573% ACC (Table 5). All results for this drug sensitivity metric and dataset are higher than a 96% ACC.
Table 5. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the AUC-based GDSC2 dataset.
Table 5. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the AUC-based GDSC2 dataset.
LASSO = 1 and mRMR = 2LASSO = 2 and mRMR = 1
k# of FeaturesRFk# of FeaturesRF
17696.47017696.470
25196.47027096.470
33296.47034296.470
42396.47044096.470
51596.26252696.470
6796.36562596.470
7696.57371796.470
8596.15981496.470
9596.1599996.366
10596.15910996.366
11496.05511596.159
12496.05512496.055
13496.05513496.055
14296.26214496.055
For the half maximal inhibitory concentration-based drug compounds regarding the GDSC2 dataset, the highest accuracy value with the minimum number of selected features is achieved by choosing 9 features, assigning LASSO = 2 and mRMR = 1 rank-based weights, a total weight value of 11, with a 96.677% ACC (Table 6). We can say that all results are around 96% for this specification.
Table 6. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the LN_IC50-based GDSC2 dataset.
Table 6. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the LN_IC50-based GDSC2 dataset.
LASSO = 1 and mRMR = 2LASSO = 2 and mRMR = 1
k# of FeaturesRFk# of FeaturesRF
18096.47018096.470
24996.57327996.470
33796.47034896.470
42896.57444696.573
52296.47053596.574
61296.67763396.574
71296.67772696.574
81096.36582496.574
9896.57392096.574
10796.573102096.574
11696.46911996.677
12496.46812696.470
13496.46813596.262
14496.46814396.366
For the RMSE-based drug compounds regarding the GDSC2 dataset, the highest accuracy value with the minimum number of selected features is obtained with 8 selected features by assigning LASSO = 2 and mRMR = 1, with a total weight value of 12, and a 96.470% ACC using the RF model (Table 7). Although there are many alternatives in terms of the best ACC value, we chose this one due to the minimum number of features selected.
Table 7. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the RMSE-based GDSC2 dataset.
Table 7. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the RMSE-based GDSC2 dataset.
LASSO = 1 and mRMR = 2LASSO = 2 and mRMR = 1
k# of FeaturesRFk# of FeaturesRF
19296.47019296.470
26196.47028796.470
33796.47035696.470
42596.47045396.470
52096.47053496.470
61596.47063196.470
71496.47072296.470
8996.36582096.470
9996.36591796.470
10496.366101696.470
11396.366111396.470
12396.36612896.470
13396.36613396.366
14396.366
For the Z_SCORE-based drug compounds regarding the GDSC2 dataset, the highest accuracy value with the minimum number of selected features is obtained with 11 selected features, assigning LASSO = 1 and mRMR = 2 rank-based weights, a total weight value of 6, and a 96.781% ACC (Table 8).
Table 8. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the Z_SCORE-based GDSC2 dataset.
Table 8. The impact of the feature selection and weighting method in terms of the accuracy rate (%) for the Z_SCORE-based GDSC2 dataset.
LASSO = 1 and mRMR = 2LASSO = 2 and mRMR = 1
k# of FeaturesRFk# of FeaturesRF
111796.470111796.470
27696.470211696.470
34896.47037596.470
43696.47047396.470
52296.47054596.470
61196.78164396.573
71096.67773296.470
8896.47083096.470
9796.57391896.470
10796.573101896.470
11796.57311896.573
12596.36612696.573
13496.67713596.573
14396.57314396.470
All selected feature names set for each specification are presented in Supplementary Table S1. We also constructed SHAP-based plots for each dataset and drug sensitivity metric by using the selected feature sets to observe the impact on the model output process (see Figure 2 and Figure 3). Red colors show higher values of a feature, and blue colors represent lower values of a feature. As can be observed from Figure 2, TGX221 is the most popular and significant (first-ranked) feature among them for the prediction model performance impact on the GDSC1 dataset. According to Figure 2c, THZ-2-49 is also the most efficient drug compound in terms of the RMSE metric on this dataset. We can say that a higher value of the TGX221 has a negative impact on the model output, while THZ-2-49 has a positive impact on the RF model output. TGX221 and THZ-2-49 drug compounds are the most critical compounds (i.e., ranked first) in terms of our feature selection methodology and SHAP-based results.
As can be observed from Figure 3, POMHEX is the most popular and significant (first-ranked) feature among them for the prediction model performance impact on the GDSC2 dataset according to AUC. Similarly, Staurosporine is the best compound in terms of LN_IC50, and Z_SCORE, and Taselisib is the most critical feature in terms of RMSE on the GDSC2 dataset. We can say that a higher value of Staurosporine has a negative impact on the model output, while POMHEX has a positive impact on the RF model output.

3. Discussion

In this study, we investigated the selection of drug compound features based on drug sensitivity metrics (i.e., LN_IC50, AUC, RMSE, and Z_SCORE) using the GDSC datasets. We aimed to identify discriminative and critical descriptors, drug compounds that are most predictive of response variability across GBM versus other cancer cell lines. This drug sensitivity-driven approach to the feature selection process is motivated by the increasing requirement for the machine learning model in precision oncology that is not only accurate but also interpretable and focused on actionable outcomes. Furthermore, to address the common issue of missing data in high-dimensional GDSC datasets, we employed iterative imputation as a data preprocessing step. Iterative imputation estimates missing values for each feature by iteratively using the relationships with all other available features. Compared to simpler imputation techniques like mean or median imputation, iterative imputation allows for more accurate and context-aware estimations. This approach is advantageous in complex biomedical datasets, where the relationships among variables are often nonlinear and interdependent. Iterative imputation has been shown to reduce bias and preserve the integrity of the original data distribution better than univariate methods by providing a convenient, flexible, and popular technique [35,36]. Despite its sophisticated characteristics, iterative imputation requires more computational resources compared to simpler methods [35,37]. We prioritized clear reporting of missingness and an iterative imputation strategy, rather than significance testing of differences that were widespread across the dataset. Since all variables exhibited missingness, we utilized a consistent iterative imputation strategy across features and experiments. This ensured dataset integrity and reduced bias that can arise from variable-specific handling. We analyzed the selected feature subsets for each dataset and metric with SHAP-based plots to observe and interpret their impact on the model prediction performance in terms of explainable AI. The results highlight the utilization of integrating drug response data directly into the feature selection process, potentially leading to more informative and biologically meaningful results. AUC and IC50 directly describe the sensitivity and potency of the drug, respectively, while the RMSE itself is not a sensitivity measure but rather a representation of the curve-fit error and the Z_SCORE is a means of data normalization. Given this, RMSE and Z_SCORE are less biologically interpretable but can provide a means to validate the method, as critical compounds may not be arguably selected by AUC and IC50, while RMSE and Z_SCOREs would not be used as ground truth labels of drug sensitivity.
One of the key findings of this work is that certain compound features consistently showed high importance with TGX221 emerging in GDSC1, while POMHEX emerged in GDSC2. TGX221 is not present in GDSC2, and POMHEX is not present in GDSC1. While compounds are shared between the datasets, 2 selected and shared compounds, namely Afatinib and Navitoclax, were identified, and some of the identified compounds above could not have been shared as they were not present in the opposite dataset. This illustrates that the data present in the datasets is critical to achieve drug repurposing. A combination of the datasets, however, would not be recommended given critical distinctions between the two sets. With harmonization, using dataset covariates allows the two datasets could be harmonized; however, only 122 compounds are shared between the two sets, hence limiting the data available to machine learning. We also found that the key identified compounds were critical to performance in AUC and IC50, but not in RMSE, given that it is not a sensitivity measure. TGX221 has a putative target PI3Kbeta and hence the PI3K/MTOR signaling pathway. TGX221 has been shown to induce apoptosis in GBM cells as well as impair migration and invasion [38]. POMHEX is listed as an unclassified compound in GDSC2. In the literature, POMHEX is described as a small molecule Enolase inhibitor that is aimed at targeting glycolysis in cancer and has been employed in ENO-1 deleted glioma cells [39]. Currently, neither compound has been employed in humans and there is no data beyond preclinical work.
There are several limitations that must be acknowledged. First, the GDSC datasets, while comprehensive, are still limited by the number of compounds tested and the cell line models used. The findings related to molecular and pharmacologic responses observed in vitro should not be interpreted as evidence of demographic effects in patient populations. The results may be shaped by how the cell lines were derived and the repository process selection used. Most of the drugs in these datasets are not FDA-approved. There are only 28 FDA-approved drugs for the GDSC1 and 31 FDA-approved drugs for the GDSC2 datasets among all drugs (Supplementary Table S4). Another limitation of this study is the lack of single-cell-level analysis. Because tumor samples exhibit substantial cellular heterogeneity, relying solely on bulk measurements can mask subpopulation-specific signals that are relevant to treatment response. As single-cell sequencing technologies continue to advance, capturing variability across cell states and cell types has become increasingly significant and critical for both mechanistic understanding and predictive modeling [40]. The lack of deep learning-based methods is a limitation of our study. As highlighted by Zhang et al. [41], deep learning techniques show considerable potential for improving predictive accuracy and clinical applicability. Despite this limitation, the framework delivers reliable and interpretable findings on pharmacologic responses, laying the groundwork for future deep learning applications. Incorporating such approaches will be an important direction for future research to further strengthen the translational impact of our framework. We employed stratified five-fold cross-validation to preserve the natural class distribution in each fold. This approach ensures that both training and evaluation sets reflect the true imbalance of the dataset. This process prevents minority classes from being excluded during the training or evaluation phase. No oversampling or undersampling techniques were applied, as our goal was to maintain biological validity and preserve robustness for the results and rather than artificially balance the datasets. While the stratified cross-validation approach preserved class proportions across folds, it did not resolve the underlying imbalance problem. While overall accuracy was around 96%, the balanced accuracy was around 55%, underscoring the challenge of minority-class detection. We report this metric transparently and note that our study’s primary contribution lies in the biological insights from cell line drug responses by employing the feature selection methodology. In summary, selecting drug compound features based on sensitivity data shows strong potential for improving how we model and understand drug responses in GBM versus other cancer cell lines. This approach improves predictive model performance, and it can also offer valuable insights. Since both data and computational tools continue to evolve, we believe this type of hybrid feature selection will play an important role in advancing precision oncology and supporting more effective drug discovery.

4. Materials and Methods

This section outlines the experimental design, data sources, and analytical methods used to investigate our feature selection and weighting methodology, including definitions, techniques, and prediction models utilized in this study.

4.1. Datasets

The Genomics of Drug Sensitivity in Cancer (GDSC) [42] Project is a Wellcome-funded collaborative effort between the Cancer Genome Project at the Wellcome Sanger Institute (UK) and the Center for Molecular Therapeutics at Massachusetts General Hospital Cancer Center (USA). This partnership brings together the strengths of both institutions to identify cancer biomarkers that can help predict which genetically defined patient subgroups are most likely to benefit from specific cancer treatments [42].
The characteristics of the utilized GDSC datasets (i.e., GDSC1 and GDSC2) are presented in Table 9 in detail. The GDSC1 and GDSC2 datasets are two complementary parts of the GDSC project, created to systematically profile the sensitivity of cancer cell lines to anti-cancer drugs and associate this with genomic features. Differences between the datasets include the time period when the data was obtained, with GDSC2 being more recent, the dose design used for different drug concentrations to measure dose–response relationships, the control and storage of the compounds employed, and in common they have the analytic approach based on fitting a dose–response curve and extracting summary metrics such as IC50 (i.e., half-maximal inhibitory concentration) and AUC (Area Under the Curve). Given the above, they are treated as separate analyses but are complementary to each other [42]. There are 122 shared drug compounds (i.e., features) between the GDSC1 and GDSC2 datasets (Supplementary Table S2). Additionally, 22 of these shared drug compounds are in our selected feature set in terms of using all metrics and datasets (Supplementary Table S3). We used the same parameter settings for both datasets in terms of clinical usability or drug repurposing.

4.2. Methodology

This section provides a general overview of the feature selection and weighting architecture we employ, including a brief description of the essential methodologies.

Proposed Scheme

This study adopted our previous similar studies from Tasci et al. [7,43] in terms of the methodology used. We constructed drug compound-based cell line datasets with GBM/other cancers (i.e., binary) class labels. Afterwards, our previous methodology was applied to the related drug compound-based GDSC datasets for each drug sensitivity metric. Our utilized scheme includes two phases: (i) Feature weighting; (ii) Feature selection (Figure 4).
The general operation of the methodology can be summarized as follows: Two popular feature selection methods, LASSO and mRMR, are utilized for the feature selection tasks. Then, we keep all selected features for each fold of the cross-validation in a list with a count. We increase the count of the selected features with respect to feature weights (i.e., 1 or 2 depending on their importance). Furthermore, we evaluate the total value of the feature weights for the different combination schemes, such as the total weight value in terms of accuracy rate. After trying all combinations among these schemes, we select the final feature set that has the maximum accuracy rate with the minimum number of selected features. This approach provides an efficient way to reduce dimensionality for high-dimensional datasets by determining relevant feature names.
In other words, we employed a rank-based feature weighting methodology in this study. LASSO and mRMR feature selection methods were assessed according to their predictive performance (i.e., accuracy for classification tasks), and the better-performing method was assigned a higher weight value. For each cross-validation fold, features selected by LASSO or mRMR received weights of 2 or 1, respectively, contributing to their cumulative importance scores. In this study, k (i.e., weight value) values between 1 and 15 were investigated due to the five-fold cross-validation. If the five-fold cross-validation is used and a feature is chosen in each fold by both feature selection methods, the total maximum weight value will be 15 (i.e., 2 × 5 folds + 1 × 5 folds). For example, if there is no selected feature chosen in each fold by both feature selection methods, the maximum total weight value will be lower than 15. If a feature is selected in only one fold of cross-validation by only one feature selection method, the minimum total weight will be 1. Then, we evaluated all ranked weight combinations to identify the most effective selection scheme and to optimize predictive performance for the corresponding dataset (i.e., maximum accuracy rate with minimum number of selected features).

5. Conclusions

This study presents a hybrid machine learning-based feature selection approach for drug sensitivity-driven compound selection in cancer cell lines. Random forest machine learning model with different rank-based weighting schemes is assessed to provide robust drug compound candidates for discriminating GBM and other cancer lines with discriminative features from a high-dimensional/large-scale bioinformatics dataset. Advancements and ongoing research in this field are set to substantially improve GBM treatment outcomes while making significant contributions to cancer-related work, particularly in drug discovery and personalized medicine. Future work can focus on applying our hybrid feature selection method across diverse biomedical datasets to further improve the generalization of the model and results. Future efforts can also aim to incorporate single-cell-derived features into our framework, drawing on methodologies such as those presented by Lai et al. [40], which utilize single-cell-resolved signatures to improve response prediction and deepen biological interpretation. As another direction of future work can combine cell line-based results with genome-wide association studies-derived cohorts to better connect molecular mechanisms and enrich this study scope. We will also focus on expanding datasets to improve minority-class representation in a biologically consistent manner.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms262411871/s1.

Author Contributions

E.T.: Conceptualization, Data Curation, Methodology, Software, Investigation, Writing—Original Draft Preparation, Visualization, Review and Editing. K.C.: Supervision, Funding acquisition. A.V.K.: Conceptualization, Investigation, Supervision, Project administration, Funding acquisition, Writing—original draft preparation, Visualization, Review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

Funding was provided in part by the NCI NIH intramural program (ZID BC 010990).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are available in Genomics of Drug Sensitivity in Cancer (GDSC) at https://www.cancerrxgene.org/downloads/bulk_download, accessed on 5 September 2025. The related datasets were derived from this resource available in the public domain.

Conflicts of Interest

The authors declare that they have no competing interests.

Abbreviations

ACCAccuracy
AIArtificial Intelligence
AUCArea Under the Curve
CRTChemoirradiation
FDAFood and Drug Administration
FSFeature Selection
FWFeature Weighting
GBMGlioblastoma Multiforme
GDSCGenomics of Drug Sensitivity in Cancer
LASSOLeast Absolute Shrinkage and Selection Operator
LN_IC50Half Maximal Inhibitory Concentration
MLMachine Learning
MRMRMinimum Redundancy–Maximum Relevance
NCINational Cancer Institute
NIHNational Institutes of Health
RFRandom Forest
RMSERoot Mean Squared Error
RTRadiation Therapy
SHAPSHapley Additive exPlanations
TMZTemozolomide
Z_SCOREStandard score

References

  1. Carrano, A.; Zarco, N.; Phillipps, J.; Lara-Velazquez, M.; Suarez-Meade, P.; Norton, E.S.; Chaichana, K.L.; Quiñones-Hinojosa, A.; Asmann, Y.W.; Guerrero-Cázares, H. Human cerebrospinal fluid modulates pathways promoting glioblastoma malignancy. Front. Oncol. 2021, 11, 624145. [Google Scholar] [CrossRef]
  2. Patel, A.P.; Fisher, J.L.; Nichols, E.; Abd-Allah, F.; Abdela, J.; Abdelalim, A.; Abraha, H.N.; Agius, D.; Alahdab, F.; Alam, T. Global, regional, and national burden of brain and other CNS cancer, 1990–2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019, 18, 376–393. [Google Scholar] [CrossRef]
  3. Ostrom, Q.T.; Cioffi, G.; Gittleman, H.; Patil, N.; Waite, K.; Kruchko, C.; Barnholtz-Sloan, J.S. CBTRUS statistical report: Primary brain and other central nervous system tumors diagnosed in the United States in 2012–2016. Neuro-Oncol. 2019, 21, v1–v100. [Google Scholar] [CrossRef] [PubMed]
  4. Tamimi, A.F.; Juweid, M. Epidemiology and outcome of glioblastoma. In Glioblastoma; Codon Publications: Brisbane, AU, USA, 2018; pp. 143–153. [Google Scholar]
  5. Dasari, A.; Saini, M.; Sharma, S.; Bergemann, R. Pro15 healthcare resource utilisation and economic burden of glioblastoma in the United States: A systematic review. Value Health 2020, 23, S331. [Google Scholar] [CrossRef]
  6. Stupp, R.; Mason, W.P.; Van Den Bent, M.J.; Weller, M.; Fisher, B.; Taphoorn, M.J.; Belanger, K.; Brandes, A.A.; Marosi, C.; Bogdahn, U.; et al. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N. Engl. J. Med. 2005, 352, 987–996. [Google Scholar] [CrossRef] [PubMed]
  7. Tasci, E.; Popa, M.; Zhuge, Y.; Chappidi, S.; Zhang, L.; Zgela, T.C.; Sproull, M.; Mackey, M.; Kates, H.R.; Garrett, T.J. MetaWise: Combined feature selection and weighting method to link the serum metabolome to treatment response and survival in glioblastoma. Int. J. Mol. Sci. 2024, 25, 10965. [Google Scholar] [CrossRef]
  8. Rock, K.; McArdle, O.; Forde, P.; Dunne, M.; Fitzpatrick, D.; O’Neill, B.; Faul, C. A clinical review of treatment outcomes in glioblastoma multiforme—The validation in a non-trial population of the results of a randomised Phase III clinical trial: Has a more radical approach improved survival? Br. J. Radiol. 2012, 85, e729–e733. [Google Scholar] [CrossRef]
  9. American Association of Neurological Surgeons. Brain Tumors. Available online: https://www.aans.org/en/Patients/Neurosurgical-Conditions-and-Treatments/Brain-Tumors (accessed on 29 May 2025).
  10. Hanif, F.; Muzaffar, K.; Perveen, K.; Malhi, S.M.; Simjee, S.U. Glioblastoma multiforme: A review of its epidemiology and pathogenesis through clinical presentation and treatment. Asian Pac. J. Cancer Prev. APJCP 2017, 18, 3. [Google Scholar]
  11. Srivastava, S.; Anbiaee, R.; Houshyari, M.; Laxmi; Sridhar, S.B.; Ashique, S.; Hussain, S.; Kumar, S.; Taj, T.; Akbarnejad, Z.; et al. Amino acid metabolism in glioblastoma pathogenesis, immune evasion, and treatment resistance. Cancer Cell Int. 2025, 25, 89. [Google Scholar] [CrossRef]
  12. Möhn, N.; Hounchonou, H.F.; Nay, S.; Schwenkenbecher, P.; Grote-Levi, L.; Al-Tarawni, F.; Esmaeilzadeh, M.; Schuchardt, S.; Schwabe, K.; Hildebrandt, H.; et al. Metabolomic profile of cerebrospinal fluid from patients with diffuse gliomas. J. Neurol. 2024, 271, 6970–6982, Erratum in J. Neurol. 2024, 271, 7654. https://doi.org/10.1007/s00415-024-12722-5. [Google Scholar] [CrossRef]
  13. Zhao, R.; Pan, Z.; Li, B.; Zhao, S.; Zhang, S.; Qi, Y.; Qiu, J.; Gao, Z.; Fan, Y.; Guo, Q.; et al. Comprehensive Analysis of the Tumor Immune Microenvironment Landscape in Glioblastoma Reveals Tumor Heterogeneity and Implications for Prognosis and Immunotherapy. Front. Immunol. 2022, 13, 820673. [Google Scholar] [CrossRef]
  14. Wang, L.B.; Karpova, A.; Gritsenko, M.A.; Kyle, J.E.; Cao, S.; Li, Y.; Rykunov, D.; Colaprico, A.; Rothstein, J.H.; Hong, R.; et al. Proteogenomic and metabolomic characterization of human glioblastoma. Cancer Cell 2021, 39, 509–528.e20. [Google Scholar] [CrossRef]
  15. Yanovich-Arad, G.; Ofek, P.; Yeini, E.; Mardamshina, M.; Danilevsky, A.; Shomron, N.; Grossman, R.; Satchi-Fainaro, R.; Geiger, T. Proteogenomics of glioblastoma associates molecular patterns with survival. Cell Rep. 2021, 34, 108787. [Google Scholar] [CrossRef]
  16. Stupp, R.; Taillibert, S.; Kanner, A.; Read, W.; Steinberg, D.M.; Lhermitte, B.; Toms, S.; Idbaih, A.; Ahluwalia, M.S.; Fink, K.; et al. Effect of Tumor-Treating Fields Plus Maintenance Temozolomide vs Maintenance Temozolomide Alone on Survival in Patients With Glioblastoma: A Randomized Clinical Trial. JAMA 2017, 318, 2306–2316. [Google Scholar] [CrossRef] [PubMed]
  17. Stupp, R.; Hegi, M.E.; Gorlia, T.; Erridge, S.C.; Perry, J.; Hong, Y.K.; Aldape, K.D.; Lhermitte, B.; Pietsch, T.; Grujicic, D.; et al. Cilengitide combined with standard treatment for patients with newly diagnosed glioblastoma with methylated MGMT promoter (CENTRIC EORTC 26071-22072 study): A multicentre, randomised, open-label, phase 3 trial. Lancet Oncol. 2014, 15, 1100–1108. [Google Scholar] [CrossRef] [PubMed]
  18. Yalamarty, S.S.K.; Filipczak, N.; Li, X.; Subhan, M.A.; Parveen, F.; Ataide, J.A.; Rajmalani, B.A.; Torchilin, V.P. Mechanisms of Resistance and Current Treatment Options for Glioblastoma Multiforme (GBM). Cancers 2023, 15, 2116. [Google Scholar] [CrossRef]
  19. Brennan, C.W.; Verhaak, R.G.; McKenna, A.; Campos, B.; Noushmehr, H.; Salama, S.R.; Zheng, S.; Chakravarty, D.; Sanborn, J.Z.; Berman, S.H.; et al. The somatic genomic landscape of glioblastoma. Cell 2013, 155, 462–477. [Google Scholar] [CrossRef] [PubMed]
  20. Verhaak, R.G.; Hoadley, K.A.; Purdom, E.; Wang, V.; Qi, Y.; Wilkerson, M.D.; Miller, C.R.; Ding, L.; Golub, T.; Mesirov, J.P.; et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010, 17, 98–110. [Google Scholar] [CrossRef]
  21. Anwer, M.S.; Abdel-Rasol, M.A.; El-Sayed, W.M. Emerging therapeutic strategies in glioblastsoma: Drug repurposing, mechanisms of resistance, precision medicine, and technological innovations. Clin. Exp. Med. 2025, 25, 117. [Google Scholar] [CrossRef]
  22. Tan, S.K.; Jermakowicz, A.; Mookhtiar, A.K.; Nemeroff, C.B.; Schürer, S.C.; Ayad, N.G. Drug Repositioning in Glioblastoma: A Pathway Perspective. Front. Pharmacol. 2018, 9, 218. [Google Scholar] [CrossRef]
  23. De Sousa-Coelho, A.L.; Solaković, B.; Bento, A.D.; Fernandes, M.T. Drug Repurposing for Targeting Cancer Stem-like Cells in Glioblastoma. Cancers 2025, 17, 2999. [Google Scholar] [CrossRef]
  24. Linhares, P.; Carvalho, B.; Vaz, R.; Costa, B.M. Glioblastoma: Is There Any Blood Biomarker with True Clinical Relevance? Int. J. Mol. Sci. 2020, 21, 5809. [Google Scholar] [CrossRef]
  25. Ntafoulis, I.; Koolen, S.L.W.; van Tellingen, O.; den Hollander, C.W.J.; Sabel-Goedknegt, H.; Dijkhuizen, S.; Haeck, J.; Reuvers, T.G.A.; de Bruijn, P.; van den Bosch, T.P.P.; et al. A Repurposed Drug Selection Pipeline to Identify CNS-Penetrant Drug Candidates for Glioblastoma. Pharmaceuticals 2024, 17, 1687. [Google Scholar] [CrossRef] [PubMed]
  26. Sun, S.; Shyr, Z.; McDaniel, K.; Fang, Y.; Tao, D.; Chen, C.Z.; Zheng, W.; Zhu, Q. Reversal Gene Expression Assessment for Drug Repurposing, a Case Study of Glioblastoma. Res. Sq. 2024. [Google Scholar] [CrossRef] [PubMed]
  27. Marcuello, C.; Lim, K.; Nisini, G.; Pokrovsky, V.S.; Conde, J.; Ruggeri, F.S. Nanoscale Analysis beyond Imaging by Atomic Force Microscopy: Molecular Perspectives on Oncology and Neurodegeneration. Small Sci. 2025, 5, e202500351. [Google Scholar] [CrossRef]
  28. Masud, N.; Hasib, M.H.H.; Ibironke, B.; Block, C.; Hughes, J.; Ekpenyong, A.; Sarkar, A. Exploring the heterogeneity in glioblastoma cellular mechanics using in-vitro assays and atomic force microscopy. Sci. Rep. 2025, 15, 19302. [Google Scholar] [CrossRef] [PubMed]
  29. Xia, Y.; Sun, M.; Huang, H.; Jin, W.-L. Drug repurposing for cancer therapy. Signal Transduct. Target. Ther. 2024, 9, 92. [Google Scholar] [CrossRef]
  30. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  31. Tasci, E.; Zhuge, Y.; Camphausen, K.; Krauze, A.V. Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets. Cancers 2022, 14, 2897. [Google Scholar] [CrossRef]
  32. IterativeImputer. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html (accessed on 4 September 2025).
  33. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  34. Joo, M.; Park, A.; Kim, K.; Son, W.-J.; Lee, H.S.; Lim, G.; Lee, J.; Lee, D.H.; An, J.; Kim, J.H. A deep learning model for cell growth inhibition IC50 prediction and its application for gastric cancer patients. Int. J. Mol. Sci. 2019, 20, 6276. [Google Scholar] [CrossRef]
  35. Hameed, W.M.; Ali, N.A. Missing value imputation techniques: A survey. UHD J. Sci. Technol. 2023, 7, 72–81. [Google Scholar] [CrossRef]
  36. Liu, J.; Gelman, A.; Hill, J.; Su, Y.-S.; Kropko, J. On the stationary distribution of iterative imputations. Biometrika 2014, 101, 155–173. [Google Scholar] [CrossRef]
  37. Imputation of Missing Values. Available online: https://scikit-learn.org/stable/modules/impute.html#impute (accessed on 23 September 2025).
  38. Yang, X.; Yang, J.A.; Liu, B.H.; Liao, J.M.; Yuan, F.E.; Tan, Y.Q.; Chen, Q.X. TGX-221 inhibits proliferation and induces apoptosis in human glioblastoma cells. Oncol. Rep. 2017, 38, 2836–2842. [Google Scholar] [CrossRef]
  39. Lin, Y.H.; Satani, N.; Hammoudi, N.; Yan, V.C.; Barekatain, Y.; Khadka, S.; Ackroyd, J.J.; Georgiou, D.K.; Pham, C.D.; Arthur, K.; et al. An enolase inhibitor for the targeted treatment of ENO1-deleted cancers. Nat. Metab. 2020, 2, 1413–1426. [Google Scholar] [CrossRef]
  40. Lai, G.; Xie, B.; Zhang, C.; Zhong, X.; Deng, J.; Li, K.; Liu, H.; Zhang, Y.; Liu, A.; Liu, Y. Comprehensive analysis of immune subtype characterization on identification of potential cells and drugs to predict response to immune checkpoint inhibitors for hepatocellular carcinoma. Genes. Dis. 2025, 12, 101471. [Google Scholar] [CrossRef] [PubMed]
  41. Zhang, C.; Yang, J.; Chen, S.; Sun, L.; Li, K.; Lai, G.; Peng, B.; Zhong, X.; Xie, B. Artificial intelligence in ovarian cancer drug resistance advanced 3PM approach: Subtype classification and prognostic modeling. EPMA J. 2024, 15, 525–544. [Google Scholar] [CrossRef] [PubMed]
  42. Yang, W.; Soares, J.; Greninger, P.; Edelman, E.J.; Lightfoot, H.; Forbes, S.; Bindal, N.; Beare, D.; Smith, J.A.; Thompson, I.R. Genomics of Drug Sensitivity in Cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2012, 41, D955–D961. [Google Scholar] [CrossRef] [PubMed]
  43. Tasci, E.; Chappidi, S.; Zhuge, Y.; Zhang, L.; Cooley Zgela, T.; Sproull, M.; Mackey, M.; Camphausen, K.; Krauze, A.V. GLIO-Select: Machine Learning-Based Feature Selection and Weighting of Tissue and Serum Proteomic and Metabolomic Data Uncovers Sex Differences in Glioblastoma. Int. J. Mol. Sci. 2025, 26, 4339. [Google Scholar] [CrossRef]
Figure 1. Missing value example statistics for the GDSC1 and GDSC2 datasets based on the AUC metric (a) GDSC1 and (b) GDSC2 dataset.
Figure 1. Missing value example statistics for the GDSC1 and GDSC2 datasets based on the AUC metric (a) GDSC1 and (b) GDSC2 dataset.
Ijms 26 11871 g001
Figure 2. GDSC1 Dataset SHAP-based plots: (a)AUC; (b) LN_IC50; (c) RMSE; (d) Z_SCORE.
Figure 2. GDSC1 Dataset SHAP-based plots: (a)AUC; (b) LN_IC50; (c) RMSE; (d) Z_SCORE.
Ijms 26 11871 g002
Figure 3. GDSC2 Dataset SHAP-based plots: (a)AUC; (b) LN_IC50; (c) RMSE; (d) Z_SCORE.
Figure 3. GDSC2 Dataset SHAP-based plots: (a)AUC; (b) LN_IC50; (c) RMSE; (d) Z_SCORE.
Ijms 26 11871 g003
Figure 4. The overview of our proposed scheme for feature selection tasks on drug compound-based GDSC datasets.
Figure 4. The overview of our proposed scheme for feature selection tasks on drug compound-based GDSC datasets.
Ijms 26 11871 g004
Table 9. GDSC utilized the dataset characteristics and methodology.
Table 9. GDSC utilized the dataset characteristics and methodology.
GDSC1 DatasetGDSC2 Dataset
Total # of Instances968963
# of GBM/# of Non-GBM33/93534/929
Total # of Features378286
Feature Selection MethodsmRMR + LASSO
Cross-Validation TypeStratified 5-fold CV
Classification ModelRandom Forest
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tasci, E.; Camphausen, K.; Krauze, A.V. Drug Repurposing in Glioblastoma Using a Machine Learning-Based Hybrid Feature Selection Approach. Int. J. Mol. Sci. 2025, 26, 11871. https://doi.org/10.3390/ijms262411871

AMA Style

Tasci E, Camphausen K, Krauze AV. Drug Repurposing in Glioblastoma Using a Machine Learning-Based Hybrid Feature Selection Approach. International Journal of Molecular Sciences. 2025; 26(24):11871. https://doi.org/10.3390/ijms262411871

Chicago/Turabian Style

Tasci, Erdal, Kevin Camphausen, and Andra Valentina Krauze. 2025. "Drug Repurposing in Glioblastoma Using a Machine Learning-Based Hybrid Feature Selection Approach" International Journal of Molecular Sciences 26, no. 24: 11871. https://doi.org/10.3390/ijms262411871

APA Style

Tasci, E., Camphausen, K., & Krauze, A. V. (2025). Drug Repurposing in Glioblastoma Using a Machine Learning-Based Hybrid Feature Selection Approach. International Journal of Molecular Sciences, 26(24), 11871. https://doi.org/10.3390/ijms262411871

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop