Baseline Radiomics as a Prognostic Tool for Clinical Benefit from Immune Checkpoint Inhibition in Inoperable NSCLC Without Activating Mutations

Moiseenko, Fedor; Radulovic, Marko; Tsvetkova, Nadezhda; Chernobrivceva, Vera; Gabina, Albina; Oganesian, Any; Makarkina, Maria; Elsakova, Ekaterina; Krasavina, Maria; Barsova, Daria; Artemeva, Elizaveta; Khenshtein, Valeria; Levchenko, Natalia; Chubenko, Viacheslav; Egorenkov, Vitaliy; Volkov, Nikita; Bogdanov, Alexei; Moiseyenko, Vladimir

doi:10.3390/cancers17111790

Open AccessArticle

Baseline Radiomics as a Prognostic Tool for Clinical Benefit from Immune Checkpoint Inhibition in Inoperable NSCLC Without Activating Mutations

by

Fedor Moiseenko

^1,2

,

Marko Radulovic

^3,*

,

Nadezhda Tsvetkova

¹,

Vera Chernobrivceva

¹,

Albina Gabina

^1,2

,

Any Oganesian

^1,4,

Maria Makarkina

¹

,

Ekaterina Elsakova

¹

,

Maria Krasavina

¹,

Daria Barsova

¹,

Elizaveta Artemeva

¹

,

Valeria Khenshtein

¹

,

Natalia Levchenko

¹,

Viacheslav Chubenko

¹

,

Vitaliy Egorenkov

¹,

Nikita Volkov

¹

,

Alexei Bogdanov

¹

and

Vladimir Moiseyenko

¹

N.P Napalkov Saint Petersburg Clinical Research and Practical Centre for Specialized Types of Medical Care (Oncological), Leningradskaya Str. 68A, 197758 Saint Petersburg, Russia

²

N.N. Petrov National Medical Research Center of Oncology, Ministry of Public Health of the Russian Federation, Leningradskaya Str. 68, 197758 Saint Petersburg, Russia

³

Department of Experimental Oncology, Institute of Oncology and Radiology of Serbia, 11000 Belgrade, Serbia

⁴

Department of Oncology, Pediatric Oncology and Radiation Therapy, St.-Petersburg State Pediatric Medical University, St. Lithuanian 2, 194100 Saint Petersburg, Russia

^*

Author to whom correspondence should be addressed.

Cancers 2025, 17(11), 1790; https://doi.org/10.3390/cancers17111790

Submission received: 9 April 2025 / Revised: 16 May 2025 / Accepted: 26 May 2025 / Published: 27 May 2025

(This article belongs to the Special Issue Enhancing Precision in Cancer Treatment: AI-Driven Innovations in Imaging)

Download

Browse Figures

Review Reports Versions Notes

Simple Summary

This study introduces a powerful machine learning-based radiomics approach to help improve predictions of immunotherapy outcomes in patients with non-small cell lung cancer (NSCLC). We believed that the full potential of CT scan-based tumor analysis had not been achieved, partly due to limited use of model integrations (ensembles) in previous research. To address this, we tested 1680 combinations of data processing and machine learning methods, selecting the best-performing ones to create an integrated (ensemble) model. Using clinical and imaging data, our final model achieved an AUC of 0.86 for predicting 24-month patient survival, which, to our knowledge, exceeds previously published results for this diagnosis and disease outcomes. This approach reduces the weaknesses of relying on a single model and offers a more reliable and accurate tool for predicting immunotherapy outcomes.

Abstract

Background/Objectives: Checkpoint inhibitors (ICIs) are key therapies for NSCLC, but current selection criteria, such as excluding mutation carriers and assessing PD-L1, lack sensitivity. As a result, many patients receive costly treatments with limited benefit. Therefore, this study aimed to predict which NSCLC patients would achieve durable survival (≥24 months) with immunotherapy. Methods: A comprehensive ensemble radiomics approach was applied to pretreatment CT scans to prognosticate overall survival (OS) and predict progression-free survival (PFS) in a cohort of 220 consecutive patients with inoperable NSCLC treated with first-line ICIs (pembrolizumab or atezolizumab, nivolumab or prolgolimab) as monotherapy or in combination. The radiomics pipeline evaluated four normalization methods (none, min-max, Z-score, mean), four feature selection techniques (ANOVA, RFE, Kruskal–Wallis, Relief), and ten classifiers (e.g., SVM, random forest). Using two to eight radiomics features, 1680 models were built in the Feature Explorer (FAE) Python package. Results: Three feature sets were evaluated: clinicopathological (CP) only, radiomics only, and a combined set, using 6- and 12-month PFS and 24-month OS endpoints. The top 15 models were ensembled by averaging their probability scores. The best performance was achieved at 24-month OS with the combined CP and radiomics ensemble (AUC = 0.863, accuracy = 85%), followed by radiomics-only (AUC = 0.796, accuracy = 82%) and CP-only (AUC = 0.671, accuracy = 76%). Predictive performance was lower for 6-month (AUC = 0.719) and 12-month PFS (AUC = 0.739) endpoints. Conclusions: Our radiomics pipeline improved selection of NSCLC patients for immunotherapy and could spare non-responders unnecessary toxicity while enhancing cost-effectiveness.

Keywords:

NSCLC; prediction; prognosis; radiomics; machine learning; ensemble; immunotherapy; checkpoint inhibitors

Graphical Abstract

1. Introduction

Non-small cell lung cancer (NSCLC) is a common, deadly, and highly heterogeneous tumor type. This heterogeneity is evident in its clinical course, histological characteristics, genetic and expression profiles, and in responses to treatment. Different treatment modalities can lead to diverse outcomes; for example, some therapies provide immediate tumor shrinkage and symptom improvement, while others deliver a survival advantage over the long term [1]. The combination of these benefits forms the basis for treatment decisions in clinical practice.

Historically, chemotherapy was the main NSCLC treatment, often resulting in significant tumor shrinkage, though typically for only a short duration. Targeted therapies, on the other hand, have shown the potential for longer-lasting responses but are limited to a small group of patients with specific molecular aberrations. More recently, immunotherapy, particularly checkpoint inhibitors, has emerged as a transformative treatment, offering remarkably long survival benefits for a subset of patients [2].

The selection of patients who are likely to benefit from immunotherapy is based on several factors, including the carcinoma’s origin (with smoking history playing a role), the molecular genetic profile (such as mutations associated with sensitivity or resistance and tumor mutational burden), and signs of immune system activation. Among these factors, PD-L1 expression is currently the primary criterion for estimating both the likelihood of an overall response and a long-lasting response. PD-L1 also serves as a marker of the mechanism by which tumor cells protect themselves from activated immune cells, highlighting the complex interactions among the tumor, its microenvironment, and the host organism. However, PD-L1 status has notable limitations; for instance, approximately 20% of patients with high PD-L1 expression still experience early progression, while a proportion of PD-L1-negative patients do respond to treatment. These challenges underscore the need for more refined patient-selection strategies, with the goal of improving outcomes and personalizing treatments in the heterogeneous landscape of NSCLC.

The selection of NSCLC patients who are likely to benefit from immunotherapy is increasingly approached by use of CT radiomics, which reveals subtle intra-tumor heterogeneity that cannot be identified by visual inspection of imaging scans. Radiomics quantitatively characterizes tumor imaging data by extracting features that capture tumor size, shape, voxel intensity distribution, spatial relationships, and texture patterns within the tumor VOI [3]. These features are then entered into machine learning models to classify patients according to their clinical outcomes and therapy response [4].

Current state-of-the-art approaches for classifying therapy response and disease outcomes in NSCLC still leave considerable room for improvement [5]. Radiomics still achieves only moderate predictive power on held-out test sets, with reported AUCs rarely exceeding 0.75. Consequently, radiomics only has potential to become a supportive biomarker rather than a stand-alone decision tool. Progress is limited by the absence of optimized, standardized, and widely adopted pipelines for feature selection and classification only sporadic use of ensemble techniques and limited reproducibility due to rare sharing of code, hyperparameters, and imaging protocols. Until sequential internal validation and external validation become routine, radiomics will remain confined to proof-of-concept research instead of becoming a reliable clinical tool [6,7,8,9,10,11,12,13]. To address these gaps, we aimed to improve the consistency and predictive accuracy of radiomics-based classification by employing an ensemble approach that integrates the best-performing individual models. Such an ensemble radiomics framework mitigates the limitations of single-model variability and improves overall robustness and generalizability, providing a foundation for more reproducible and accurate outcome prediction in future studies.

The novelty of this study is based on systematic evaluation of 1680 standardized pipelines combining normalization, preprocessing, feature selection, and classification steps. It also introduces an ensemble framework that integrates top-performing models, to reduce reliance on any single model’s chance success and enhance predictive stability. Importantly, ensembling was performed by soft voting, averaging continuous probability scores of individual models to avoid the bias of hard voting in selecting probability score thresholds. Several previous ensemble studies in NSCLC used XGBoost classifier alone [14,15], whereas we used a similar AdaBoost classifier alongside nine others. One study ensembled five classifiers using hard voting using a fixed 0.5 probability score threshold for each individual model [16]. The most comprehensive prior approach combined RF, SVM, and LASSO to build 54 predictive models [8]. Notably, many studies use the term “ensemble” to describe the integration of multiple feature types rather than distinct radiomic workflows [8,15,17]. Our current study thus presents a substantial methodological advance by addressing workflow optimization and standardization through an exhaustive pipeline, supported by openly shared code for reproducibility and wider adoption.

Motivated by the clinical need for more reliable prognostic tools in immunotherapy-treated NSCLC patients, we aimed to implement a comprehensive workflow to systematically explore 1680 combinations of normalization, preprocessing, feature selection, and classification strategies on pretreatment CT scans. The top-performing models were then integrated into a unified ensemble radiomics signature, aiming to enhance generalizability, reduce variance due to chance performance of individual models, and support more robust prognostic assessments.

2. Materials and Methods

2.1. Patient Group

We used a retrospective database of patients treated for inoperable NSCLC at the N.P Napalkov Saint Petersburg Clinical Research and Practical Centre for Specialized Types of Medical Care (Oncological) in 2021. This study was approved by the institutional Ethics Committee (Approval No. 3, dated 14 March 2023). Patients received checkpoint inhibitors (pembrolizumab and bioanalogues, atezolizumab, nivolumab, or prolgolimab) as first-line palliative therapy, either as monotherapy or in combination.

Only patients with available pre-immunotherapy images were included. Outcomes were defined as 6- or 12-month progression-free survival (PFS) and 24-month overall survival (OS). Clinical characteristics are presented in Table 1. All the patients were treated according to national guidelines and MDT decisions, receiving a checkpoint inhibitor either alone or with chemotherapy as part of routine care. Treatment response was evaluated with CT every 6–8 weeks, adverse events were monitored per local practice, and overall survival was determined via patient phone contact or the national health database. Progression of disease was defined according to iRECIST criteria. Progression of disease was considered as a time from day one of first cycle to date of first registered progression, provided that this progression was subsequently confirmed by a second investigation.

The prospective sample size calculation was based on a pilot study of the first 120 chronologically included patients, which indicated that a minimum of 96 patients, including 12 positive cases, would be required. The calculations were based on an alpha value of 0.05, a beta value of 0.20, a positive-to-negative case ratio of 0.14, and an expected effect size corresponding to an AUC of 0.75. The final study included 220 patients, with at least 147, 75, and 37 survivors (positive cases) observed in the 6-, 12-, and 24-month groups, respectively. All actual parameters exceeded the initial sample size estimates, with positive-to-negative ratios of 0.67, 0.34, and 0.17, respectively, and an achieved highest prognostic AUC of 0.86.

2.2. Image Acquisition

CT chest images from 220 patients were acquired in uncompressed DICOM format (slice thickness ≤ 2.5 mm) using a Siemens Somatom Definition 128 CT scanner (Siemens, Munich, Germany) at the N.P. Napalkov Saint Petersburg Clinical Research and Practical Centre for Specialized Types of Medical Care (Oncological). The dataset excluded low-dose protocols, while the use of contrast was not a limiting factor. During database creation, the radiologist was responsible for data selection, interpretation, and primary reporting, image labeling, and data deidentification. CT scanning and preliminary analysis were performed using a RadiAnt DICOM Viewer v2025.1. Data processing involved preparing the CT datasets, image pre-processing, detecting pathological lesions, and segmentation. Contouring of pathological lung lesions was performed in 3D Slicer v.5.8.1, an open-source medical image analysis platform, using multiplanar reconstruction (MPR) of chest CT images. This process was guided by initial radiology reports to ensure accuracy and consistency. Tumor VOIs were semi-automatically segmented slice-by-slice in the axial plane using the “Grow from Seeds” tool (FastGrowCut) in 3D Slicer’s Segment Editor by a radiation oncologist (F.M.) with 15 years of oncologic imaging experience. Threshold-based selection, paint, erase, and island removal tools were used to refine segmentations. The final tumor VOIs were reviewed for consistency and exported in MRB format for further analysis. Figure 1 illustrates the final segmentation.

2.3. Feature Extraction

For the radiomics analysis, we used the open-source Python package Pyradiomics v3.1.0, which is compliant with the Imaging Biomarker Standardization Initiative (IBSI) standards. [18]. Using the parameter file provided below, the software was configured to generate all available image transformations and feature types, resulting in a total of 2157 features computed per CT scan. Seven image transformations were applied wavelet, square, square root, logarithm, gradient, exponential, and Laplacian of Gaussian (LoG).

To reduce inter-scan variability, the CT scans were first z-score normalized (mean intensity = 0, standard deviation = 1) and then resampled to an isotropic voxel size of 1 × 1 × 1 mm using ‘sitkBSpline.’ Radiomic features were extracted only from tumor VOIs, yielding 107 standard shape, intensity (first-order), and texture (second-order) features from the original images, while higher-order features were computed from the transformed images.

The bin width was individually determined for each filter type in a pilot analysis of 75 CTs to keep the number of gray-level bins per scan between 30 and 130, which is considered optimal for textural reproducibility without causing over-smoothing or excessive noise sensitivity [19]. This was necessary because the seven image filter transformations produced images with distinct pixel intensity ranges, requiring separate bin-width settings for each filter to maintain a consistent bin count across all individual scans. For detailed descriptions of the extracted radiomic features, please refer to https://pyradiomics.readthedocs.io/en/latest/features.html (accessed on 25 May 2025).

Below is the Params.yaml file:

setting:
    normalize: true
    normalizeScale: 600
    resampledPixelSpacing: [1, 1, 1]
    interpolator: 'sitkBSpline'
    voxelArrayShift: 1000
    binWidth: 30.0
    label: 2
imageType:
    Original:
    LoG:
        sigma: [1.0, 2.0, 3.0, 4.0, 5.0]
        binWidth: 15.0
    Wavelet:
        binWidth: 8.0
    Square:
        binWidth: 15
    SquareRoot:
       binWidth: 25
    Logarithm:
        binWidth: 50
    Exponential:
        binWidth: 6
    Gradient:
        binWidth: 14
featureClass:
    glcm:
    firstorder:
    shape2D:
    shape:
    glrlm:
    glszm:
    gldm:
    ngtdm:

2.4. Model Selection

The data were partitioned based on the chronological order of patient inclusion into training (n = 110), validation (n = 44), and test cohorts (n = 66) at the ratio of 5:2:3, for model training, validation, and independent evaluation, respectively. The endpoints were defined as binary outcomes: 6-month PFS, 12-month PFS, and 24-month OS. The supervised machine learning modeling was performed using the FeAture Explorer FAEv0.6.0.7z python package with NumPy, pandas, and scikit-learn [20]. The source code is openly available on GitHub (https://github.com/salan668/FAE.git (accessed on 25 May 2025)). The machine learning pipeline began with CSV files containing a binary outcome column and either CP-only, radiomics-only, or CP and radiomics feature columns. All features underwent identical normalization, feature selection, and classification procedures. Data balancing was performed through upsampling. At every step, preprocessing, feature selection, and classification, only one method was applied per each step, rather than combining multiple methods simultaneously. For example, feature pre-selection involved discarding features with Pearson’s correlation coefficient above 0.97 or, alternatively, applying principal component analysis. Feature selection was then performed using one of the following methods: ANOVA, Kruskal–Wallis (KW), Recursive Feature Elimination (RFE), or Relief. The remaining highly relevant features were used as input for one of these classifiers: support vector machine (SVM), linear discriminant analysis (LDA), logistic regression (LR), AdaBoost, Gaussian process (GP), multilayer perceptron (MLP), random forest (RF), least absolute shrinkage and selection operator (LASSO), decision tree (DT), or naïve Bayes (NB). The script was set to select between two and eight features. All possible combinations of the above components resulted in 1680 radiomics pipelines, trained on the development set and evaluated on the test dataset. Parameters such as slope, intercept, weight coefficients, and support vectors are learned from the data during training, while hyperparameters are not derived from the data. Instead, hyperparameters are tuned via grid-search based on the model’s performance on validation sets during cross-validation.

2.5. Ensemble Modeling

The probability scores delivered by each model were standardized to a mean of 0 and a standard deviation of 1 using z-score normalization. The top 15 prognostic models in the test set were then integrated, averaging their continuous normalized scores to generate an overall ensemble probability. These ensembles were subsequently also evaluated on the reserved test set. We tested varying numbers of models in the ensemble during preliminary experiments and found that including more than approximately 15 models no longer improved prognostic performance. “Soft” voting, which aggregates continuous probability scores, was used instead of hard voting binarized class labels to avoid biases introduced by early categorization.

2.6. Statistical Analysis

For each combination of features and endpoints, binary classification performance was evaluated on the reserved test set using the FAE Python package by calculating a comprehensive set of performance metrics, ROC analysis, F1 score, and the Youden index, to assess overall discrimination ability, while accuracy and balanced accuracy measure general correctness, with the latter accounting for class imbalance. True positives and true negatives were used as basic classification counts, while Matthews correlation coefficient (MCC) is a robust summary measure across all confusion matrix elements, especially in imbalanced datasets. Together, these metrics offer a wide and complementary evaluation of model performance.

2.7. Validation

Model performance was evaluated in two stages. First, ten-fold cross-validation was performed on the development set, using an internal validation subset within each fold. Next, the best-performing models in the test set were combined, and ensembles tested again on the test set, composed sequentially of the most recent 30% of patients. To prevent information leakage, the FAE radiomics pipeline was finalized prior to evaluation on the hold-out test subset.

3. Results

3.1. Patient Characteristics

We included 220 patients with inoperable NSCLC who received checkpoint inhibitors in a real-world setting (Table 1). The majority of patients were male (78.5%) and had stage IV disease (72.3%). Approximately half were smokers (46.4%) and had non-squamous histology (49.5%).

The vast majority of patients received either an atezolizumab-based combination (48.3%) or pembrolizumab (36.5%; Table 2). A response was achieved in 15.6% of patients. The median progression-free survival was 8.2 months [6.8–9.5], similar to that observed in registrational trials, and the median overall survival was 22.0 months (19.6–24.4, Table 2) [21].

3.2. Experimental Design

Figure 1 outlines the study workflow, which included CT scans from 220 patients. Briefly, patient and imaging data were curated, and 1680 CT-based radiomics models were trained, validated, and tested. These models were generated by combining three normalization methods, two dimensionality reduction techniques, four feature selection methods, and ten classifiers, with the number of selected features restricted to between two and eight (Figure 1). The split into development and test sets was based on the chronological order of patient inclusion, as detailed in the Methods section. This approach simulated a real-world clinical scenario in which retrospective data are used to predict outcomes for future patients from the same institution.

3.3. Performance of Individual Models

The best-performing individual models in the test set were then combined into an ensemble and evaluated in the reserved test set for stratification of PFS and OS following immune checkpoint inhibitor therapy. Models were developed for 6-, 12-, and 24-month endpoints, and the evaluation metrics included AUC, accuracy, balanced accuracy, true positives, true negatives, MCC, F1 score and the Youden index.

Among the top 15 individual models for the 24-month endpoint that included CP and radiomics features, the most frequent normalization methods were mean (40%) and min-max (33%). The most frequent preprocessing methods were PCC (53%) and PCA (47%). The most frequent feature selectors were Relief (60%) and Kruskal–Wallis (20%), while the most frequent classifiers were AdaBoost (60%) and Auto-Encoder (12%). The risk of insufficient diversification due to AdaBoost dominance is unlikely, because the three upstream optimization layers introduced additional variability in the selection of features entering AdaBoost. Figure 2 displays the classification evaluation of the individual models that best prognosticated 24-month overall survival in the test set. Applying a 24-month survival endpoint allowed us to define a patient subgroup whose PFS exceeded double that of the remaining cohort, highlighting those who derived maximal benefit from immunotherapy. As expected, individual models exhibited much higher prognostic performance on the training set compared to the validation and test sets (Figure 2). This performance gap reflects the degree of overfitting and serves as an indicator of each model’s generalizability.

We addressed potential temporal confounding by comparing prognostic performance in the chronologically selected test set with that in randomly selected validation subsets. This comparison provided a direct way to assess time-related bias. Among the 15 top-performing models for the 24-month endpoint that combined CP and radiomics features, the average AUC in the randomly selected validation subsets was 0.74 (SD 0.12, 95% CI 0.67 to 0.82), while, in the chronologically selected test set, the average AUC was 0.64 (SD 0.03, 95% CI 0.62 to 0.66). These AUC values refer to averages of individual models, not to the ensemble models. The AUC values obtained in the validation and test sets were significantly different, with t-statistic of 2.81 and p = 0.015 by an independent two-sample Welch t-test. Although the chronological design reduced performance, it remains the only clinically relevant internal modeling approach, because it mimics prospective prognostication within the same institution using retrospective data.

3.4. Performance of Ensemble Models

It is important to note that ensembles of the 15 best-performing models in the test set for each feature combination outperformed any individual model, except for the smallest model based solely on CP features (Figure 2 and Figure 3). This synergistic ensemble effect was most pronounced with the pooled CP and radiomics features for the 24-month endpoint, likely because the diversity of features provided synergistic benefits that greatly enhanced prognostic performance (Table 3 and Figure 3). In contrast to the individual models (Figure 2), the ensemble results (Table 3 and Figure 3) achieved superior prognostic performance when combining clinicopathological and radiomics features. Notably, the best-performing ensemble was obtained by using the 24-month endpoint (Table 3).

Figure 3 illustrates the good discrimination efficiency of the ensemble prognosticator which includes radiomics features and an excellent performance of the ensemble combining both CP and radiomics features. The continuous values of the CP and radiomics features prognosticator stratify a 100% homogenous group of non-survivors, comprising 42% of the total patients, without a single survivor (Figure 3).

4. Discussion

Computational analysis transforms clinical imaging into a rich source of quantitative features, gaining importance as imaging becomes more widely available and computational power continues to increase [4,22]. This study applied a machine learning-based radiomics approach to address the pressing clinical need for improved prediction of immunotherapy outcomes in NSCLC patients.

We hypothesized that the full prognostic potential of CT-derived tumor morphology has not been reached, largely due to limited experimentation within radiomics workflows, especially the underuse of ensemble modeling techniques. To address this gap, we implemented a stabilizing radiomics strategy that generates a diverse array of models and integrates them into a robust ensemble. This approach was aimed at enhancing predictive accuracy and minimizing the biases associated with individual models.

One of the key advantages of our study was the optimization of the radiomics modeling pipeline, resulting in the generation of 1680 prognostic models. This extensive exploration allowed the selection and ensembling of the top-performing models in the test set, thereby capturing complementary information across them, leading to improved performance. The ensemble models were then tested on the previously unobserved test dataset. This approach helps to reduce the bias associated with any individual model and to improve the classification performance in the test set [20,23]. In preliminary optimization, models using only one predictor underperformed those combining two or more features, while increasing the number of features beyond eight did not result in a notable improvement in performance Allowing models to include up to 30 features would have permitted more extensive pipeline exploration, producing approximately 10,000 candidate models; however, we prioritized robustness by restricting models to a small number of features for classifier construction. Other studies also used a small number of features for classifier construction to improve the generalizability [24].

Our best-performing ensemble model, which combined clinical parameters and radiomics features, achieved the AUC of 0.86 and C-index of 0.84 in predicting 24-month overall survival, surpassing most previously published results. This is likely attributable to the fact that only a few previous studies have utilized ensemble methods. However, the study by Upadhaya et al. reached AUCs up to 0.67 for predicting two-year overall survival in lung cancer patients, despite using an advanced ensemble methodology based on foundational artificial intelligence in addition to radiomics [8], while an ensemble deep learning strategy by Saad et al. achieved a C-index of 0.75 [25]. A deep learning ensemble model developed to predict recurrence of NSCLC at 12 months reached average AUCs up to 0.77 across validation folds, while test set results were not reported [26]. A similar predictive performance to our study was achieved by Gong et al., with AUCs of 0.89 and 0.85 in two validation cohorts employing a CT radiomics-based ensemble model [14]. However, its endpoint was not survival but brain metastasis occurrence [14]). Liu et al. employed a CT-based radiomics model to predict PFS and OS in NSCLC patients treated with nivolumab [27]. In their study, the average AUC values for predicting PFS and OS were 0.73 and 0.61, respectively. Multiple other efforts utilizing deep learning or logistic regression [5] have achieved AUCs ranging from 0.7 to 0.9 based on sample sizes between 48 and 1135 patients [7,28,29,30]. Improved performance by AUC of 0.823 was achieved by extraction of radiomics features from both primary tumors and lymph nodes using 18F-FDG PET imaging for predicting pathological complete response to neoadjuvant chemoimmunotherapy in NSCLC patients [6].

Similarly to the current study’s findings, Fried et al. combined pretreatment CT texture features with conventional prognostic features [31]. Models that integrated both textural features and conventional features demonstrated improved risk stratification for overall survival compared with models based solely on conventional features. In line with our current results, incorporating patient demographics and clinical factors alongside radiomics features has repeatedly been shown to enhance the predictive power of machine learning models [32]. Other studies reported that combining radiomics with clinical features only yielded AUCs up to 0.60 for predicting four-year progression-free and overall survival in NSCLC patients treated with Nivolumab or Pembrolizumab [15]. One study integrating clinical data and deep radiomics predicted survival in an independent test set after immune checkpoint inhibitor treatment by an AUCs of 0.824 and 0.753 for six- and nine-month survival, respectively [33].

One limitation of radiomics analysis is the difficulty in interpreting most features. Although many of our top-performing models included PCA-derived features, the best model that combined both clinicopathological and radiomics features did not rely on PCA, which allowed the easy interpretation of one of the four features in this model as the simple, intensity minimum of original images. In this case, a higher minimum intensity (resulting in a lighter image) is associated with a better outcome. PCA features are not interpretable because they are linear combinations of multiple original features. However, both PCA and image filter transformations (such as wavelet, gradient, logarithm) only marginally introduce additional opacity because most texture radiomics features are inherently abstract, being higher-order mathematical constructs, and often applied to transformed images. Therefore, radiomics accepts limited interpretability in favor of performance [34]. PCA-derived and native radiomics features were handled identically during feature selection and classification, ensuring no interpretability bias in the final models.

Another limitation of many radiomics studies is low reproducibility, due to a lack of standardization, insufficient reporting, or the absence of open-source code. We addressed this by using open-source code and providing the parameters file used for radiomics feature extraction. Additionally, we ensembled the best individual models to reduce variability in performance, which should further enhance reproducibility. Additionally, despite the objective nature of the computational analysis technique, the limitations of the workflow of this study included residual subjectivity due to the semi-automatic tumor VOI segmentation. Segmentation reproducibility by a single experienced radiologist. might have also impacted the model generalizability. Although intra- or interobserver variability was not assessed in this study, the semi-automatic 3D Slicer-based segmentation method used here has demonstrated high interobserver consistency in NSCLC [35]. Moreover, ensemble-based radiomics models with multistep feature selection have shown resilience to small segmentation variability [36].

Furthermore, radiomics studies are often based on retrospectively collected data and thus mainly serve as proof-of-concept, while prospective studies are needed to confirm the value of radiomics. To address this limitation, we divided the development and test sets in sequential chronological order, mirroring future routine clinical application of this methodology, in which the outcomes of current patients would be predicted using retrospective data.

5. Conclusions

We developed and evaluated multiple ensemble radiomics models to prognosticate OS and predict PFS in NSCLC by utilizing handcrafted radiomics signatures, clinical factors, and their combination. By designating the most recent CT scans as an internal test set, we simulated a realistic clinical scenario, whereby a model trained on retrospective data predicts outcomes for prospectively treated patients at the same institution. Our ensemble radiomics framework mitigates the limitations of single-model variability and enhances both generalizability and accuracy, thereby offering a template for more reproducible and accurate predictions of immunotherapy outcomes in future studies.

The achieved improvement in prognostic accuracy could enable personalized treatment by more reliable selection of NSCLC patients who are likely to benefit from immunotherapy. Further validation of pretreatment radiomics-based patient stratification should be pursued using larger and external datasets to confirm the possibility of a wider clinical applicability of this analytical workflow. Additionally, integrating radiomics with complementary modalities such as pathomics, genomics, proteomics, deep radiomics, and emerging markers like TMB and ctDNA could further enhance prognostic accuracy and improve treatment outcomes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers17111790/s1, manuscript-supplementary.xlsx.

Author Contributions

Conceptualization, F.M., M.R., A.B., N.V. and V.M.; methodology, M.R. and F.M.; software, M.R.; validation, M.R.; formal analysis, F.M. and M.R.; investigation, F.M., N.T., V.C. (Vera Chernobrivceva), A.G., A.O., M.M., E.E., M.K., D.B., E.A., V.K., N.L. and V.C. (Viacheslav Chubenkoresources), N.L., V.C. (Viacheslav Chubenko), V.E., N.V., A.B. and V.M.; data curation, F.M. and M.R.; writing—original draft preparation, M.R. and F.M.; writing—review and editing, M.R.; visualization, M.R.; project administration, F.M. and M.R.; funding acquisition, V.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science, Technological Development and Innovation of the Republic of Serbia (Agreement No. 451-03-136/2025-03/200043).

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the N.P Napalkov Saint Petersburg Clinical Research and Practical Centre for Specialized Types of Medical Care (Oncological), (Approval No. 3, dated 14 March 2023).

Informed Consent Statement

Informed consent was obtained from all the subjects involved in this study. Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement

The raw data supporting the conclusions of this article are available in Supplementary Materials. The FAE source code is openly available on GitHub (https://github.com/salan668/FAE.git (accessed on 25 May 2025)).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

NLR	Neutrophil-to-Lymphocyte Ratio
CR	Complete Response
PR	Partial Response
SD	Stable Disease
PD	Progressive Disease
RR	Response Rate
DCR	Disease Control Rate
CP	Clinical Parameters
AUC	Area Under the Curve
TP	True Positives
TN	True Negatives
MCC	Matthews Correlation Coefficient

References

Buono, M.; Russo, G.; Nardone, V.; Della Corte, C.M.; Natale, G.; Rubini, D.; Palumbo, L.; Scimone, C.; Ciani, G.; D’Onofrio, I.; et al. New perspectives on inoperable early-stage lung cancer management: Clinicians, physicists, and biologists unveil strategies and insights. J. Liq. Biopsy 2024, 5, 100153. [Google Scholar] [CrossRef] [PubMed]
Wolf, E.; Sacchi de Camargo Correia, G.; Li, S.; Zhao, Y.; Manochakian, R.; Lou, Y. Emerging Immunotherapies for Advanced Non-Small-Cell Lung Cancer. Vaccines 2025, 13, 128. [Google Scholar] [CrossRef] [PubMed]
Kolarevic, D.; Tomasevic, Z.; Dzodic, R.; Kanjer, K.; Vukosavljevic, D.N.; Radulovic, M. Early prognosis of metastasis risk in inflammatory breast cancer by texture analysis of tumour microscopic images. Biomed. Microdevices 2015, 17, 92. [Google Scholar] [CrossRef] [PubMed]
Speckter, H.; Radulovic, M.; Trivodaliev, K.; Vranes, V.; Joaquin, J.; Hernandez, W.; Mota, A.; Bido, J.; Hernandez, G.; Rivera, D.; et al. MRI radiomics in the prediction of the volumetric response in meningiomas after gamma knife radiosurgery. J. Neurooncol. 2022, 159, 281–291. [Google Scholar] [CrossRef]
Ligero, M.; Gielen, B.; Navarro, V.; Cresta Morgado, P.; Prior, O.; Dienstmann, R.; Nuciforo, P.; Trebeschi, S.; Beets-Tan, R.; Sala, E.; et al. A whirl of radiomics-based biomarkers in cancer immunotherapy, why is large scale validation still lacking? NPJ Precis. Oncol. 2024, 8, 42. [Google Scholar] [CrossRef]
Liu, X.; Ji, Z.; Zhang, L.; Li, L.; Xu, W.; Su, Q. Prediction of pathological complete response to neoadjuvant chemoimmunotherapy in non-small cell lung cancer using (18)F-FDG PET radiomics features of primary tumour and lymph nodes. BMC Cancer 2025, 25, 520. [Google Scholar] [CrossRef]
Chen, X.; Meng, F.; Zhang, P.; Wang, L.; Yao, S.; An, C.; Li, H.; Zhang, D.; Li, H.; Li, J.; et al. Establishing a deep learning model that integrates pre- and mid-treatment computed tomography to predict treatment response for non-small cell lung cancer. Int. J. Radiat. Oncol. Biol. Phys. 2025; in press. [Google Scholar] [CrossRef]
Upadhaya, T.; Chetty, I.J.; McKenzie, E.M.; Bagher-Ebadian, H.; Atkins, K.M. Application of CT-based foundational artificial intelligence and radiomics models for prediction of survival for lung cancer patients treated on the NRG/RTOG 0617 clinical trial. BJR Open 2024, 6, tzae038. [Google Scholar] [CrossRef]
Zandberg, D.P.; Zenkin, S.; Ak, M.; Mamindla, P.; Peddagangireddy, V.; Hsieh, R.; Anderson, J.L.; Delgoffe, G.M.; Menk, A.; Skinner, H.D.; et al. Evaluation of radiomics as a predictor of efficacy and the tumor immune microenvironment in anti-PD-1 mAb treated recurrent/metastatic squamous cell carcinoma of the head and neck patients. Head Neck 2025, 47, 129–138. [Google Scholar] [CrossRef]
Peng, J.; Zou, D.; Zhang, X.; Ma, H.; Han, L.; Yao, B. A novel sub-regional radiomics model to predict immunotherapy response in non-small cell lung carcinoma. J. Transl. Med. 2024, 22, 87. [Google Scholar] [CrossRef]
Liao, C.Y.; Chen, Y.M.; Wu, Y.T.; Chao, H.S.; Chiu, H.Y.; Wang, T.W.; Chen, J.R.; Shiao, T.H.; Lu, C.F. Personalized prediction of immunotherapy response in lung cancer patients using advanced radiomics and deep learning. Cancer Imaging 2024, 24, 129. [Google Scholar] [CrossRef] [PubMed]
Sun, R.; Sundahl, N.; Hecht, M.; Putz, F.; Lancia, A.; Rouyar, A.; Milic, M.; Carre, A.; Battistella, E.; Alvarez Andres, E.; et al. Radiomics to predict outcomes and abscopal response of patients with cancer treated with immunotherapy combined with radiotherapy using a validated signature of CD8 cells. J. Immunother. Cancer 2020, 8, e001429. [Google Scholar] [CrossRef] [PubMed]
Clyne, M.; Offman, J.; Shanley, S.; Virgo, J.D.; Radulovic, M.; Wang, Y.; Ardern-Jones, A.; Eeles, R.; Hoffmann, E.; Yu, V.P. The G67E mutation in hMLH1 is associated with an unusual presentation of Lynch syndrome. Br. J. Cancer 2009, 100, 376–380. [Google Scholar] [CrossRef] [PubMed]
Gong, J.; Wang, T.; Wang, Z.; Chu, X.; Hu, T.; Li, M.; Peng, W.; Feng, F.; Tong, T.; Gu, Y. Enhancing brain metastasis prediction in non-small cell lung cancer: A deep learning-based segmentation and CT radiomics-based ensemble learning model. Cancer Imaging 2024, 24, 1. [Google Scholar] [CrossRef]
Yolchuyeva, S.; Giacomazzi, E.; Tonneau, M.; Lamaze, F.; Orain, M.; Coulombe, F.; Malo, J.; Belkaid, W.; Routy, B.; Joubert, P.; et al. Imaging-Based Biomarkers Predict Programmed Death-Ligand 1 and Survival Outcomes in Advanced NSCLC Treated With Nivolumab and Pembrolizumab: A Multi-Institutional Study. JTO Clin. Res. Rep. 2023, 4, 100602. [Google Scholar] [CrossRef]
Tang, F.H.; Fong, Y.W.; Yung, S.H.; Wong, C.K.; Tu, C.L.; Chan, M.T. Radiomics-Clinical AI Model with Probability Weighted Strategy for Prognosis Prediction in Non-Small Cell Lung Cancer. Biomedicines 2023, 11, 2093. [Google Scholar] [CrossRef]
Lin, S.; Ma, Z.; Yao, Y.; Huang, H.; Chen, W.; Tang, D.; Gao, W. Automatic machine learning accurately predicts the efficacy of immunotherapy for patients with inoperable advanced non-small cell lung cancer using a computed tomography-based radiomics model. Diagn. Interv. Radiol. 2025, 31, 130–140. [Google Scholar] [CrossRef]
van Griethuysen, J.J.M.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; Beets-Tan, R.G.H.; Fillion-Robin, J.C.; Pieper, S.; Aerts, H. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017, 77, e104–e107. [Google Scholar] [CrossRef]
Rodrigues, A.; Rodrigues, N.; Santinha, J.; Lisitskaya, M.V.; Uysal, A.; Matos, C.; Domingues, I.; Papanikolaou, N. Value of handcrafted and deep radiomic features towards training robust machine learning classifiers for prediction of prostate cancer disease aggressiveness. Sci. Rep. 2023, 13, 6206. [Google Scholar] [CrossRef]
Song, Y.; Zhang, J.; Zhang, Y.D.; Hou, Y.; Yan, X.; Wang, Y.; Zhou, M.; Yao, Y.F.; Yang, G. FeAture Explorer (FAE): A tool for developing and comparing radiomics models. PLoS ONE 2020, 15, e0237587. [Google Scholar] [CrossRef]
Rodriguez-Abreu, D.; Powell, S.F.; Hochmair, M.J.; Gadgeel, S.; Esteban, E.; Felip, E.; Speranza, G.; De Angelis, F.; Domine, M.; Cheng, S.Y.; et al. Pemetrexed plus platinum with or without pembrolizumab in patients with previously untreated metastatic nonsquamous NSCLC: Protocol-specified final analysis from KEYNOTE-189. Ann. Oncol. 2021, 32, 881–895. [Google Scholar] [CrossRef] [PubMed]
Rajkovic, N.; Li, X.; Plataniotis, K.N.; Kanjer, K.; Radulovic, M.; Milosevic, N.T. The Pan-Cytokeratin Staining Intensity and Fractal Computational Analysis of Breast Tumor Malignant Growth Patterns Prognosticate the Occurrence of Distant Metastasis. Front. Oncol. 2018, 8, 348. [Google Scholar] [CrossRef] [PubMed]
Castillo, T.J.; Starmans, M.P.A.; Arif, M.; Niessen, W.J.; Klein, S.; Bangma, C.H.; Schoots, I.G.; Veenland, J.F. A Multi-Center, Multi-Vendor Study to Evaluate the Generalizability of a Radiomics Model for Classifying Prostate cancer: High Grade vs. Low Grade. Diagnostics 2021, 11, 369. [Google Scholar] [CrossRef] [PubMed]
Zheng, R.; Shi, C.; Wang, C.; Shi, N.; Qiu, T.; Chen, W.; Shi, Y.; Wang, H. Imaging-Based Staging of Hepatic Fibrosis in Patients with Hepatitis B: A Dynamic Radiomics Model Based on Gd-EOB-DTPA-Enhanced MRI. Biomolecules 2021, 11, 307. [Google Scholar] [CrossRef]
Saad, M.B.; Hong, L.; Aminu, M.; Vokes, N.I.; Chen, P.; Salehjahromi, M.; Qin, K.; Sujit, S.J.; Lu, X.; Young, E.; et al. Predicting benefit from immune checkpoint inhibitors in patients with non-small-cell lung cancer by CT-based ensemble deep learning: A retrospective study. Lancet Digit. Health 2023, 5, e404–e420. [Google Scholar] [CrossRef]
Kim, G.; Moon, S.; Choi, J.H. Deep Learning with Multimodal Integration for Predicting Recurrence in Patients with Non-Small Cell Lung Cancer. Sensors 2022, 22, 6594. [Google Scholar] [CrossRef]
Liu, C.; Gong, J.; Yu, H.; Liu, Q.; Wang, S.; Wang, J. A CT-Based Radiomics Approach to Predict Nivolumab Response in Advanced Non-Small-Cell Lung Cancer. Front. Oncol. 2021, 11, 544339. [Google Scholar] [CrossRef]
Ren, Q.; Xiong, F.; Zhu, P.; Chang, X.; Wang, G.; He, N.; Jin, Q. Assessing the robustness of radiomics/deep learning approach in the identification of efficacy of anti-PD-1 treatment in advanced or metastatic non-small cell lung carcinoma patients. Front. Oncol. 2022, 12, 952749. [Google Scholar] [CrossRef]
Bracci, S.; Dolciami, M.; Trobiani, C.; Izzo, A.; Pernazza, A.; D’Amati, G.; Manganaro, L.; Ricci, P. Quantitative CT texture analysis in predicting PD-L1 expression in locally advanced or metastatic NSCLC patients. Radiol. Med. 2021, 126, 1425–1433. [Google Scholar] [CrossRef]
Zheng, Y.M.; Zhan, J.F.; Yuan, M.G.; Hou, F.; Jiang, G.; Wu, Z.J.; Dong, C. A CT-based radiomics signature for preoperative discrimination between high and low expression of programmed death ligand 1 in head and neck squamous cell carcinoma. Eur. J. Radiol. 2022, 146, 110093. [Google Scholar] [CrossRef]
Fried, D.V.; Tucker, S.L.; Zhou, S.; Liao, Z.; Mawlawi, O.; Ibbott, G.; Court, L.E. Prognostic value and reproducibility of pretreatment CT texture features in stage III non-small cell lung cancer. Int. J. Radiat. Oncol. Biol. Phys. 2014, 90, 834–842. [Google Scholar] [CrossRef] [PubMed]
Wang, T.; She, Y.; Yang, Y.; Liu, X.; Chen, S.; Zhong, Y.; Deng, J.; Zhao, M.; Sun, X.; Xie, D.; et al. Radiomics for Survival Risk Stratification of Clinical and Pathologic Stage IA Pure-Solid Non-Small Cell Lung Cancer. Radiology 2022, 302, 425–434. [Google Scholar] [CrossRef] [PubMed]
Farina, B.; Guerra, A.D.R.; Bermejo-Pelaez, D.; Miras, C.P.; Peral, A.A.; Madueno, G.G.; Jaime, J.C.; Vilalta-Lacarra, A.; Perez, J.R.; Munoz-Barrutia, A.; et al. Integration of longitudinal deep-radiomics and clinical data improves the prediction of durable benefits to anti-PD-1/PD-L1 immunotherapy in advanced NSCLC patients. J. Transl. Med. 2023, 21, 174. [Google Scholar] [CrossRef] [PubMed]
Demircioglu, A. Reproducibility and interpretability in radiomics: A critical assessment. Diagn. Interv. Radiol. 2024. [Google Scholar] [CrossRef]
Velazquez, E.R.; Parmar, C.; Jermoumi, M.; Mak, R.H.; van Baardwijk, A.; Fennessy, F.M.; Lewis, J.H.; De Ruysscher, D.; Kikinis, R.; Lambin, P.; et al. Volumetric CT-based segmentation of NSCLC using 3D-Slicer. Sci. Rep. 2013, 3, 3529. [Google Scholar] [CrossRef]
Cama, I.; Guzmán, A.; Campi, C.; Piana, M.; Lekadir, K.; Garbarino, S.; Díaz, O. Segmentation variability and radiomics stability for predicting Triple-Negative Breast Cancer subtype using Magnetic Resonance Imaging. arXiv 2025, arXiv:2504.01692. [Google Scholar] [CrossRef]

Figure 1. Flowchart of CT-based prognostic model construction for prediction of lung cancer PFS or prognostication of OS. CT scans were acquired after diagnosis, and tumor VOIs were segmented. Prior to radiomics analysis, images underwent normalization, resampling to 1 × 1 × 1 mm, and interpolation. A total of 2157 radiomics features were extracted using PyRadiomics. The CP and radiomics features were combined by adding them to a CSV file along with the outcome column. The cohort was further divided into development and test sets in a 70:30 ratio. Feature values were normalized using z-score, mean normalization (to −0.5, 0.5), or min-max normalization (to 0, 1). Subsequently, data balancing, preprocessing, feature selection, grid search for optimal hyperparameters, and classification were performed. Feature selection and classification were conducted by 10-fold cross-validation in the development set. The final prognostic model was constructed by combining the 15 best-performing models into ensembles using soft voting.

Figure 2. The classification performance of the best-performing individual models in the test set for the 24-month endpoint. The models were based on CP features alone, radiomics features alone, or a combination of CP and radiomics features. Panels (a–c) show the prognostic evaluation of the best-performing model in the test set for each feature combination.

Figure 3. Classification performance of the ensembles for the 24-month endpoint using clinicopathological features alone, radiomics features alone, and a combined approach. The ensemble scores were calculated by summing up the probability scores of the 15 best-performing models. Panel (a) shows the ROC analysis of the three ensemble scores. Panel (b) provides a visual representation of the classification between non-survivors (white) and survivors (colored), with prognostic ensemble scores ordered from lowest (left) to highest (right) continuous values. It is obvious that as the ensemble scores increase, the likelihood of patient survival also increases.

Table 1. Clinical characteristics of patients included in the study ^a.

	All Included (n = 220)	OS ≥ 24 Months (n = 52)	OS < 24 Months (n = 168)
Sex
Male	172 (78.5%)	38 (73.1%)	135 (80.4%)
Female	48 (21.5%)	14 (26.9%)	33 (19.6%)
Mean age (min-max)	63.3 (35–87)	64.6 (38–82)	62.9 (35–87)
Smoking
Yes	102 (46.4%)	31 (59.6%)	71 (42.3%)
No	118 (53.6%)	21 (40.4%)	97 (57.7%)
NLR
≥3	109 (49.5%)	25 (48.1%)	84 (50%)
Histology
Adenocarcinoma	109 (49.5%)	29 (55.8%)	80 (47.6%)
Squamous cell cancer	99 (45.0%)	21 (40.4%)	78 (46.4%)
Large cell	12 (5.5%)	2 (3.8%)	10 (6%)
Stage
IIIA	33 (15.0%)	9 (17.3%)	24 (14.3%)
IIIB	28 (12.7%)	6 (11.5%)	22 (13.1%)
IV	159 (72.3%)	37 (71.2%)	122 (72.6%)
Brain metastases
Present	16 (7.3%)	6 (11.5%)	10 (6.0%)
Liver metastases
Present	17 (7.7%)	2 (3.8%)	14 (8.4%)

^a Tumor mutational burden (TMB) and circulating tumor DNA (ctDNA) biomarkers were not included due to incomplete availability across the cohort. Abbreviations: NLR, Neutrophil-to-Lymphocyte Ratio.

Table 2. Treatment characteristics and results.

	All Included (n = 220)	OS ≥ 24 Months (n = 52)	OS < 24 Months (n = 168)
Drug
Atezolizumab	102 (48.3%)	19 (37.3%)	83 (51.9%)
Pembrolizumab	77 (36.5%)	20 (39.2%)	57 (35.6%)
Nivolumab	17 (8.1%)	3 (5.9%)	14 (8.8%)
Prolgolimab	15 (7.1%)	9 (17.6%)	6 (3.8%)
Best response
CR	3 (1.4%)	0 (0%)	3 (1.8%)
PR	26 (11.8%)	12 (23.1%)	14 (8.3%)
SD	47 (21.4%)	9 (17.3%)	38 (22.6%)
PD	110 (50.0%)	27 (51.9%)	83 (49.4%)
Not assessed	34 (15.4%)	4 (7.7%)	30 (17.9%)
RR	29 (15.6%)	12 (25%)	17 (12.3%)
DCR	76 (40.8%)	21 (43.7%)	55 (39.9%)
Median PFS (months) [95% CI]	8.2 [6.8–9.5]	15.4 [11.9–18.9]	6.97 [6.2–7.8]
Median OS (months) [95% CI]	22.0 [19.6–24.4]	33.2 [30.7–35.7]	14.5 [12.3–16.7]

Abbreviations: CR, Complete Response; PR, Partial Response; SD Stable Disease; PD, Progressive Disease; RR, Response Rate; DCR; Disease Control Rate.

Table 3. Ensemble prognostic classification metrics were calculated on the reserved test set for models predicting 6-, 12-, and 24-month outcomes ^a,b,c.

Features	AUC	AUC 95%CI	AUC p-Value	Accuracy	Balanced Accuracy	TP (%)	TN (%)	Low-Risk Count	High-Risk Count	MCC	F1 Score	Youden Index
24-month overall survival
CP	0.671	0.525–0.818	0.043	0.76	0.60	41.7	83.1	59	12	0.23	0.37	0.20
RADIOMICS	0.796	0.666–0.927	0.000	0.82	0.79	55.0	92.2	51	20	0.52	0.62	0.57
CP + RADIOMICS	0.863	0.769–0.957	0.000	0.85	0.80	61.1	92.5	53	18	0.57	0.66	0.61
12-month progression-free survival
CP	0.669	0.540–0.798	0.016	0.66	0.65	58.6	71.4	42	29	0.30	0.58	0.30
RADIOMICS	0.727	0.625–0.862	0.001	0.67	0.69	57.9	78.8	33	38	0.37	0.65	0.38
CP + RADIOMICS	0.739	0.627–0.847	0.001	0.73	0.71	72.7	73.5	49	22	0.43	0.63	0.41
6-month progression-free survival
CP	0.675	0.542–0.812	0.015	0.73	0.63	78.6	53.3	15	56	0.29	0.822	0.26
RADIOMICS	0.701	0.565–0.837	0.009	0.72	0.64	79.2	50.0	18	53	0.28	0.81	0.27
CP + RADIOMICS	0.719	0.550–0.828	0.013	0.76	0.71	84.2	57.1	21	50	0.42	0.83	0.42

^a Prognostic evaluation was performed on the reserved internal test set, which comprised 30% of the patient cohort. ^b 1680 models were computed by combining three normalization methods, two preprocessing techniques, four feature selection methods, and ten classifiers. The models selected between two and eight features. ^c Bold formatting is used to highlight the best-performing model for each endpoint. Abbreviations: CP, clinical parameters; AUC, area under the curve; TP, true positives; TN, true negatives; MCC, Matthews Correlation Coefficient.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Moiseenko, F.; Radulovic, M.; Tsvetkova, N.; Chernobrivceva, V.; Gabina, A.; Oganesian, A.; Makarkina, M.; Elsakova, E.; Krasavina, M.; Barsova, D.; et al. Baseline Radiomics as a Prognostic Tool for Clinical Benefit from Immune Checkpoint Inhibition in Inoperable NSCLC Without Activating Mutations. Cancers 2025, 17, 1790. https://doi.org/10.3390/cancers17111790

AMA Style

Moiseenko F, Radulovic M, Tsvetkova N, Chernobrivceva V, Gabina A, Oganesian A, Makarkina M, Elsakova E, Krasavina M, Barsova D, et al. Baseline Radiomics as a Prognostic Tool for Clinical Benefit from Immune Checkpoint Inhibition in Inoperable NSCLC Without Activating Mutations. Cancers. 2025; 17(11):1790. https://doi.org/10.3390/cancers17111790

Chicago/Turabian Style

Moiseenko, Fedor, Marko Radulovic, Nadezhda Tsvetkova, Vera Chernobrivceva, Albina Gabina, Any Oganesian, Maria Makarkina, Ekaterina Elsakova, Maria Krasavina, Daria Barsova, and et al. 2025. "Baseline Radiomics as a Prognostic Tool for Clinical Benefit from Immune Checkpoint Inhibition in Inoperable NSCLC Without Activating Mutations" Cancers 17, no. 11: 1790. https://doi.org/10.3390/cancers17111790

APA Style

Moiseenko, F., Radulovic, M., Tsvetkova, N., Chernobrivceva, V., Gabina, A., Oganesian, A., Makarkina, M., Elsakova, E., Krasavina, M., Barsova, D., Artemeva, E., Khenshtein, V., Levchenko, N., Chubenko, V., Egorenkov, V., Volkov, N., Bogdanov, A., & Moiseyenko, V. (2025). Baseline Radiomics as a Prognostic Tool for Clinical Benefit from Immune Checkpoint Inhibition in Inoperable NSCLC Without Activating Mutations. Cancers, 17(11), 1790. https://doi.org/10.3390/cancers17111790

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Baseline Radiomics as a Prognostic Tool for Clinical Benefit from Immune Checkpoint Inhibition in Inoperable NSCLC Without Activating Mutations

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Patient Group

2.2. Image Acquisition

2.3. Feature Extraction

2.4. Model Selection

2.5. Ensemble Modeling

2.6. Statistical Analysis

2.7. Validation

3. Results

3.1. Patient Characteristics

3.2. Experimental Design

3.3. Performance of Individual Models

3.4. Performance of Ensemble Models

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI