Investigating the Radiomic Performance Gap Driven by Delineation Strategy: Radiotherapy Gross Tumor Volume vs. Dedicated Lesion Segmentation in Proton-Treated Adenoid Cystic Carcinoma

Fontana, Giulia; Thulasi Seetha, Sithin; Levante, Lorena; Bonora, Maria; Fichera, Cristina; Trombetta, Luca; Vischioni, Barbara; Dolcetti, Vincenzo; Molinelli, Silvia; Imparato, Sara; Orlandi, Ester

doi:10.3390/technologies14030144

Open AccessArticle

Investigating the Radiomic Performance Gap Driven by Delineation Strategy: Radiotherapy Gross Tumor Volume vs. Dedicated Lesion Segmentation in Proton-Treated Adenoid Cystic Carcinoma

by

Giulia Fontana

^1,†

,

Sithin Thulasi Seetha

^1,*,†

,

Lorena Levante

^2,3,

Maria Bonora

⁴

,

Cristina Fichera

^2,3,‡,

Luca Trombetta

^5,‡,

Barbara Vischioni

⁴

,

Vincenzo Dolcetti

²,

Silvia Molinelli

⁵,

Sara Imparato

² and

Ester Orlandi

^3,4,*

¹

Clinical Department, (CNAO) National Center for Oncological Hadrontherapy, 27100 Pavia, Italy

²

Radiology Unit, Clinical Department, (CNAO) National Center for Oncological Hadrontherapy, 27100 Pavia, Italy

³

Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, 27100 Pavia, Italy

⁴

Radiation Oncology Unit, Clinical Department, (CNAO) National Center for Oncological Hadrontherapy, 27100 Pavia, Italy

⁵

Medical Physics Unit, Clinical Department, (CNAO) National Center for Oncological Hadrontherapy, 27100 Pavia, Italy

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

^‡

Affiliation at the time of the study.

Technologies 2026, 14(3), 144; https://doi.org/10.3390/technologies14030144

Submission received: 16 January 2026 / Revised: 24 February 2026 / Accepted: 26 February 2026 / Published: 28 February 2026

(This article belongs to the Special Issue Artificial Intelligence in Medical Radiation Science, Radiology and Radiation Oncology)

Download

Browse Figures

Versions Notes

Abstract

This study investigates whether dedicated tumor segmentation for radiomics (TRAD) offers any advantage over gross tumor volume (GTV) in CT radiomics for predicting adenoid cystic carcinoma (ACC) progression after proton therapy (PT). Fifty-six patients with histologically proven salivary gland ACC were included, and 107 original features were extracted using PyRadiomics v3.1.0. Signatures were selected (n = 3) with sequential backward elimination using multiple classifiers, all optimized for improving cross-validated area under the ROC curve (AUC). Signature similarity was quantified using the Spearman correlation coefficient. Random forest (RF) yielded the best discriminative performance, with no statistical difference in AUCs between contour choices (GTV: 0.87 vs. TRAD: 0.80; ΔAUC_median = 0.0, p = 0.589). Time-to-event analysis confirmed both signatures stratified patients into distinct progression-free survival risk groups (Log-rank p < 0.0001) and demonstrated robust prognostic accuracy (GTV: C-index = 0.74, HR = 11.63; TRAD: C-index = 0.72, HR = 7.01). Biologically, GTV and TRAD signatures were borderline associated with perineural spread (p = 0.056) and solid tumor patterns (p = 0.053), respectively. Overall, CT-based radiomics models performed comparably across both segmentation strategies, supporting GTV as a practical and efficient alternative to TRAD for predicting ACC progression after PT.

Keywords:

radiomics; carcinoma; adenoid cystic; proton therapy; tumor segmentation; GTV; performance comparative study; artificial intelligence

1. Introduction

Digitalization has characterized most of the progress in healthcare over the past decades [1,2]. The rapid development of information technologies, combined with the availability of vast digitized data, has allowed new possibilities for automation and precision medicine using tools such as artificial intelligence (AI) [3,4]. In the context of healthcare, radiotherapy (RT) emerged as one of the most digitalized areas [5]. While clinical and treatment data are nowadays commonly in a digital form, the widespread adoption of centralized tools (e.g., electronic medical records, commonly known as EMRs) is far from being accomplished, even with encouraging information technology improvements in recent years [6,7]. However, in the field of RT, a considerable amount of data are generated as part of the image-guided radiotherapy (IGRT) procedures from treatment planning computed tomography (CT) scan, to in-room setup verification x-ray, and follow-up radiological examinations. These radiological images have been stored and archived in institutional centralized repositories (e.g., Picture archiving and communication system or PACS) for more than a decade [8]. Hence, it is not surprising that the field of radiomics has been widely explored to date in RT.

Radiomics leverages computational methods to convert radiological images into a vast amount of high-dimensional, mineable data for predictive imaging biomarker development using AI [9]. Conventional AI methods, such as logistic regression (LR), remain popular in interpretable radiomics studies [10,11], while support vector machines (SVM) and random forest (RF) often achieve higher performance due to better handling of high-dimensional and non-linear feature space [12,13]. Their performance is typically measured using the area under the receiver operating characteristic (ROC) curve (AUC) [14,15], which quantifies the degree of separation between the predictive distributions of each target group [16]. State-of-the-art approaches increasingly combine radiomics with deep learning (DL) techniques, often reporting better performance in complex settings [17].

In RT, a recent literature review reported that a large number of studies use treatment planning CT for radiomics analysis [18], likely due to its high availability and consistency, making it the reference modality. In addition to conventional imaging data, RT offers radiomics with a multiplicity of data sources derived from the treatment planning process. Besides plan-related maps (e.g., dosimetric maps), this includes a large number of annotations, which are generated as part of the treatment planning routine. Target volumes, the gross tumor volume (GTV), and organs at risk (OARs), are contoured to optimize the treatment plan based on the prescribed doses to the target while satisfying the constraints to OARs. Data labels that define these volume-of-interests (VOI) are at the basis of radiomics, especially since the extracted features are meant to capture the underlying pathological tissue characteristics within the VOI; hence, the labeling process is a crucial step [19].

Even though most of the radiomics literature in RT takes advantage of GTV as the primary VOI [20,21], the appropriateness of this approach has not been fully endorsed and not comprehensively investigated by the scientific community. By now, it is well-established that radiomics features are highly influenced by small changes in VOI segmentation; however, this is usually done without evaluating the actual impact on the model performance [22,23,24,25]. On one side, GTV may include non-pathological tissues, resulting in a less precise lesion contouring compared to radiomics-specific tumor delineations. On the other hand, GTV is more standardized, based on international contouring guidelines [26], and represents a convenient opportunity for robust radiomics model development. Indeed, despite the great potential of radiomics models, the need for time-consuming data labeling and sub-optimal performances may hinder the accessibility and clinical relevance of such tools. In the past years, significant advancements in automatic contouring have been made, thanks to novel deep learning models [27]. Such approaches have been successfully adopted not only for OARs [28,29], but also for lesion segmentations [30], offering substantial benefits by increasing consistency while reducing the time required for contouring [29,31]. However, the most promising applications are reported for brain tumors [32,33], while other morphologically complex tumors, such as head and neck cancers, are still difficult to contour reliably.

To the best of our knowledge, only one study has evaluated the actual impact of GTV versus a dedicated radiomics-specific lesion segmentation on the prognostic performance in a large cohort of head and neck cancer patients treated with RT [34]. However, the study was limited to a standard statistical model (Cox proportional hazard) with no further investigation of more complex machine learning classifiers, commonly used in radiomics applications [35].

In the context of particle therapy, the attention to resource and time consumption is even more pronounced, given the increased technological costs characterizing this advanced RT modality [36,37]. Our study aimed to explore the differences in predictive and prognostic performance of CT-based radiomics models in predicting head and neck adenoid cystic carcinoma (ACC) progression after proton therapy (PT), relying on planning GTV versus dedicated pre-treatment tumor contour for radiomics analysis (TRAD).

2. Materials and Methods

2.1. Study Population

Patients with histologically proven adenoid cystic carcinoma of the salivary glands treated with PT at the National Center for Oncological Hadrontherapy (CNAO) in Pavia (Italy) between September 2017 and October 2022 were included in this study. Patients with a macroscopic disease who either underwent surgery with macroscopic surgical margins or refused surgery, or had unresectable tumors, were included. Patients’ data were retrieved after clinical study approval (CNAO OSS 34 2021, approved on 1 April 2021 by the Comitato Etico Pavia) from our institute’s referral Ethical Committee. All included patients provided written informed consent, and the study was conducted in compliance with the Declaration of Helsinki.

2.2. Clinical Data and Outcome Variable for Modeling

Follow-up and clinical data were retrieved from the Institutional Longitudinal Clinical Registry (REGAL, NCT05203250) to derive the main study sample features and the progression events, defined either as local or distant relapses [38]. Specifically, local progression events were radiologically assessed during follow-up magnetic resonance imaging (MRI) exams, taking advantage of contrast-enhanced T1-weighted, T2-weighted, and diffusion-weighted sequences. Instead, distant progression was derived from clinical reports. For this study, a binary outcome variable was constructed to label patients who have experienced an event of either local or distant relapse, which served as the target variable for comparing the GTV- and TRAD-based radiomics models.

2.3. Imaging, Tumor Segmentations, and Labels

Prior to treatment, each patient underwent a CT (Somatom Sensation Open, Siemens Healthcare, Erlangen, Germany) scan for treatment planning purposes. Here follows the main characteristics of the simulation CT acquisition protocol adopted at our institution: slice thickness = 2 mm; pixel spacing = 0.98 × 0.98 mm²; convolution kernel = H31s; and a reconstruction matrix = 512 × 512. Before acquiring the simulation CT, immobilization systems (i.e., tailored thermoplastic mask and headrest) were customized for each patient, and were used for both simulation MRI and the actual proton therapy. Common MRI protocols included the following sequences: T1-weighted turbo spin echo, T2-weighted turbo spin echo without fat-saturation (T2w), diffusion-weighted imaging with two-dimensional echo planar imaging, and fat-saturated volumetric contrast-enhanced T1-weighted (T1w-ce) sequences. Simulation MRI sequences were imported into the treatment planning system (Raystation, Raysearch Laboratories, Stockholm, Sweden), along with the simulation CT. MRI sequences, especially T1w-ce images and T2w, were co-registered with the planning CT to aid the radiotherapist in target and OARs contouring by virtue of their superior soft tissue contrast.

GTV was contoured by a team dedicated to the head and neck pathology. The RTStruct containing the GTV was then retrieved for radiomics analysis. However, the GTV contoured for treatment purposes often included a discrete amount of surrounding non-pathological tissues. In accordance with a previous literature approach [39,40] and based on the radiologist’s indication, the GTV was cleaned to include only voxels within the soft tissue range (Hounsfield unit (HU): [−200, +600]). On the other hand, an expert radiologist delineated a dedicated tumor volume for radiomics analysis (TRAD) on the simulation CT, who also took advantage of the same set of co-registered MRI sequences. Figure 1 shows the original GTV, the cleaned GTV, and the TRAD segmentations. Note that hereafter, ‘GTV’ denotes the cleaned GTV unless explicitly stated otherwise.

2.4. Radiomics Feature Extraction

The PyRadiomics (v3.1.0) package [41] was used to extract 107 original CT radiomics features from GTV and TRAD contours. Specifically, the extracted handcrafted radiomics features belonged to three families: shape, first-order, and texture. Shape features describe the geometry and physical form of the VOI, independent of the voxel intensities inside it; first-order features describe the distribution of voxel intensities within the VOI, ignoring any spatial relationships; texture features quantify spatial relationships between voxels and patterns, perceived as texture, and reflect tumor heterogeneity [19]. Before feature extraction, the CT volume was resampled to have an isotropic voxel size of 2 mm³ with a B-spline interpolator to ensure rotational invariance of texture features [19] and improve the overall reproducibility of radiomics features [42,43]. A bin-width discretization strategy (width = 25) was adopted to extract texture features [44,45].

2.5. Radiomics Modeling

To reduce feature space dimensionality, a two-step feature selection process was implemented. First, low-variance (<10⁻³) and highly correlated features (Spearman’s rho ≥ 0.85) were excluded under the assumption that such features have low informative value or represent redundant contributions, respectively. Second, a sequential backward selection (SBS) approach was used to iteratively exclude features that contribute the least to the model’s performance, starting from the entire set of features to a pre-specified number of features (n = 3) [46]. This procedure was evaluated using multiple classifiers and was optimized to select signatures that improve 100 times repeated 10-fold cross-validated AUC. The classifiers included: logistic regression (LR), linear support vector machine (L-SVM), and random forest (RF). Of note, the radiomics features within the training fold were used to normalize both the training and test folds to avoid data leakage [47]. A signature containing 3 features (based on the 10 events-per-variable rule [48]) was thus derived from both GTV and TRAD radiomics features independently for each classifier.

The signature stability against contour variations was assessed using perturbation analysis (simulating inter-observer variability) [24]. Five synthetic contours were randomly generated by independently perturbing the manual GTV and TRAD segmentation through scaling laterolateral (width) and/or anteroposterior (height) direction(s), and shifting the contour along the craniocaudal axis [24]. The magnitude of these perturbations was constrained to maintain a dice similarity index of at least 0.75 and a Hausdorff distance ≤ 5 mm [49]. The stability was quantified using the intraclass correlation coefficient (ICC), specifically, ICC(1, 1) [50]. Stability values falling within the intervals [0.00, 0.50], (0.50, 0.75], (0.75, 0.90], and (0.90, 1.00] are indicative of poor, moderate, good, and excellent reproducibility, respectively [50].

2.6. Statistical Analysis

Model performance was evaluated using the mean AUC obtained from 10-fold cross-validation. The 95% confidence interval (CI) for the AUC was estimated using the bias-corrected accelerated (BCa) bootstrap method [51] (n = 10,000), while statistical significance was measured with a permutation test (n = 1000). Subsequently, DeLong’s test [52] was applied to explore the statistical difference in performance between the GTV- and TRAD-based radiomics models. DeLong’s test is a non-parametric method designed for comparing ROC curves derived from the same sample [52]. Additionally, absolute differences in AUC between competing models were computed with bootstrap resampling (n = 10,000), which measures the magnitude of difference in discriminative ability (effect size). Following the approach of Demircioğlu et al. [53], Spearman correlations of the selected signatures from each classifier and VOI choices were also compared with one another. In graph theory, this corresponds to finding the maximum weighted matching in a bipartite graph [54]. A correlation of 1.0 would indicate that the two selected signatures are nearly identical.

For each VOI, the top-performing classifier was further evaluated for calibration using the Brier score and for prognostic value through progression-free survival analysis. Specifically, the out-of-fold predicted probabilities from the best GTV and TRAD models were used to stratify patients into low- and high-risk groups based on optimal Youden Index [55] cut-offs. Subsequently, the Log-rank test and Cox proportional hazard (Cox-ph) models were used to compare the competing signatures (hazard ratios and concordance indices). Finally, a preliminary investigation on the association between clinically relevant prognostic variables (solid pattern, perineural spread, and T-stage) [38] and the two competitive signatures was conducted through the Mann–Whitney U test.

A significance level of 0.05 was considered. All the statistical analysis and radiomics modeling were performed with Python v.3.7. To promote reproducibility, the code was made open-source and is available at: https://github.com/sithin-cnao/ACCRadiomics_CT.git (accessed on 23 February 2026).

3. Results

In total, 56 patients affected by ACC treated with PT were included in this study. A brief description of the most relevant sample clinical and treatment characteristics is summarized in Table 1; 41% (n = 23) were male, and the overall median age was 61.2 years. The majority of the treated ACC patients presented with a high T-stage (Stage 4: 84%) and were mostly located in the minor salivary glands (82%). After a median follow-up time of 26.4 months, nine patients experienced a local relapse while 18 reported a distant relapse. Overall, 20 patients experienced at least one progression event and were characterized by a large median GTV of 88.7 cm³ (interquartile range [IQR]: 22.3–102.3) compared to 18.4 cm³ (6.8–65.4) reported in patients with tumor control.

As part of developing GTV- and TRAD-specific signatures predictive of ACC progression, an initial feature-filtering step was applied to reduce the original set of 107 radiomics features per VOI to 22 GTV and 28 TRAD features. Subsequently, for each combination of classifier (LR, L-SVM, and RF) and VOI (GTV and TRAD), a three-feature radiomics signature was selected using SBS (see Table 2). Figure 2 compares the ROC curves associated with the GTV vs. TRAD signatures selected by each classifier. The AUC bootstrap distributions and permutation tests’ null distributions are provided in the Supplementary Materials (Supplementary Materials, Figures S1 and S2). Overall, GTV-based models demonstrated slightly higher AUC values with a broader range (AUC range: 0.73–0.87) compared to TRAD-based models (0.73–0.80). Stability analysis confirmed excellent reproducibility of selected signatures with a median ICC of 0.95 (IQR: 0.91–0.98) for GTV and 0.96 (0.91–0.98) for TRAD (see Supplementary Materials Table S1). Finally, Figure 3 presents the absolute pairwise correlation heatmap of selected signatures with each other (two VOI and three classifier choices generate six signatures, and consequently 36 pairwise comparisons). The signature compositions varied considerably, which was reflected in their overall low to moderate correlations (Spearman range: 0.15–0.61).

Note that our study aimed to compare the difference in performance of the best GTV- and TRAD-signatures. In this regard, RF emerged as the top classifier in both cases, with the GTV-derived signature showing slightly higher AUC compared to TRAD (0.87 vs. 0.80). DeLong’s test revealed that this difference was not statistically significant (p = 0.589). Likewise, their median bootstrap difference in AUCs (∆AUC) was zero. Additionally, both signatures exhibited moderate, yet similar calibration (Brier score = GTV: 0.17 vs. TRAD: 0.18). These signatures also demonstrated moderate correlation (Spearman: 0.49). In terms of clinical associations, both signatures demonstrated borderline significance with key prognostic indicators: the GTV-based signature showed a clear trend toward an association with perineural spread (p = 0.056), while the TRAD-signature exhibited a similar trend with solid tumor patterns (p = 0.053). Detailed prognostic variable association results are provided in the Supplementary Materials (see Tables S2–S4). Time-to-event analysis for progression-free survival confirmed that both signatures could stratify patients into significantly distinct risk groups (Log-rank p < 0.0001). Stratified Kaplan–Meier curves for both the competing models are provided in the Supplementary Materials, Figure S3. Cox proportional hazards modeling demonstrated robust prognostic accuracy for both the GTV-signature (C-index = 0.74; HR = 11.63, p = 0.001) and TRAD-signature (C-index = 0.72; HR = 7.01, p < 0.001).

Considering the GTV-derived signatures alone, the RF model demonstrated the best performance, with a mean AUC of 0.87 (95% BCa CI: [0.69, 0.91], p < 0.001). Its signature relied on macroscopic tumor morphology quantified by two shape features (Flatness, MajorAxisLength) and one texture descriptor (NGTDM_Busyness), which measures rapid intensity changes within a tumor’s local neighborhood. NGTDM_Busyness was significantly associated with perineural spread (p = 0.012), whereas MajorAxisLength was associated with both perineural spread (p = 0.025) and T-stage (p = 0.005). The remaining classifiers showed moderate performance, where L-SVM yielded an AUC of 0.77 [0.45, 0.84] (p = 0.008), followed by LR with 0.73 [0.52, 0.84] (p = 0.011). Although both their signatures were composed entirely of texture features, L-SVM preferred GLCM descriptors, while LR chose the GLDM family of features, resulting in low to moderate correlation (Spearman: 0.40). Their correlation with RF-derived signature was even lower, with a Spearman correlation range of 0.38–0.40 (Figure 3).

For the TRAD-derived features, both the L-SVM and RF models achieved a similar AUC of 0.80, with slightly different 95% BCa CIs of [0.69, 0.93] (p = 0.002) and [0.72, 0.91] (p = 0.006), respectively. Since RF exhibited slightly narrower bounds with a better lower confidence limit, it was considered a better model. In terms of signatures, the subsets selected by these two classifiers were mostly textures, but were not correlated with each other (Spearman: 0.35). The features selected by L-SVM were primarily GLDM texture descriptors, which quantify spatial coarseness and dependency, suggesting that this model prioritized larger-scale heterogeneity. In contrast, RF favored GLCM features (GLCM_Imc1, GLCM_MaximumProbability), which characterize local voxel-level complexity and gray-level transitions, along with a shape feature (MajorAxisLength). The texture features demonstrated significant association with solid tumor pattern (GLCM_Imc1, p = 0.046; GLCM_MaximumProbability, p = 0.030), whereas the shape feature was linked to T-stage (MajorAxisLength, p = 0.005). Finally, the LR model exhibited a moderate performance with an AUC of 0.73 [0.55, 0.86] (p = 0.015), and its signature was mainly composed of GLCM features like RF, and therefore these two models showed moderate to high correlation (Spearman: 0.61, Figure 3).

Figure 3. Heatmap showing Spearman correlations among the selected radiomics signatures for every combination of classifier (LR, L-SVM, and RF) and VOI (GTV and TRAD). LR: Logistic regression, L-SVM: Linear support vector machine, RF: Random forest classifier, VOI: Volume-of-interest, GTV: Gross tumor volume, TRAD: Tumor segmentation for radiomics.

4. Discussion

Our study evaluated the feasibility of adopting radiotherapy-labeled volumes, i.e., GTV, as inputs to radiomics models, instead of using dedicated TRAD segmentations. Gross tumor volumes present multiple advantages over the latter. In addition to being better standardized [56], they offer convenience in terms of time and resource consumption, as they are already part of existing standard-of-care practices. In contrast, TRAD imposes a significant resource and time burden, requiring expert manual segmentation on specialized workstations (often reserved for routine diagnostics or treatment planning), adding an additional overhead of 20–30 min per patient [34]. This additional time, ultimately hinders the large-scale adoption of radiomics. Beyond these practical benefits, our findings suggest that, even with a limited sample, the predictive performances of GTV-based CT radiomics models were just as good as TRAD-based models. Indeed, no statistically significant differences were detected between the two best competitive models (p = 0.589, RF classifier), and both demonstrated good discriminatory capability (0.87 and 0.80 for GTV and TRAD, respectively) in classifying relapse among ACC patients treated with proton therapy. Additionally, both signatures provided robust risk stratification for progression-free survival, clearly distinguishing high- from low-risk cohorts (Log-rank p < 0.0001).

Methodologically, we aimed to explore the impact of using GTV instead of TRAD from a modeling perspective in the context of ACC relapse prediction after PT. For a fair comparison, a standardized window was applied to the GTV, to exclude any non-tumor voxels (e.g., air, gas, and bone fragments) that could introduce confounders in the extracted features. To the best of our knowledge, only one study (Fontaine et al. [34]) has investigated the differences in predictive capabilities between radiomics models based on the cleaned radiotherapy GTV and a dedicated lesion contour. Fontaine et al. [34] found that survival models (such as Cox-ph) relying on radiomics features extracted from dedicated lesion volumes outperformed those based on GTV in predicting progression-free survival in a large cohort of oropharyngeal patients. They attributed this finding to the lower information content of GTV compared to TRAD. The underlying assumption was that the tissues included in the GTV go beyond the macroscopically visible tumor, which could be likely non-pathological, and potentially tarnish and compromise a substantial tumor characterization vital to predictive modeling. Conversely, the radiomics literature often assumes microstructures near the visible lesion as carrying meaningful information for progression prediction. Radiomics studies grounded on this assumption often evaluate the peritumoral expansion; nevertheless, an agreement is yet to be reached. Specifically, Keek et al. [57] reported that CT radiomics of peritumoral tissues, obtained through GTV expansion, was not useful to increase the prognostic performances of clinical models in head and neck squamous cell carcinomas (HNSCC). On the other hand, in another cohort of HNSCC patients, Tang et al. [21] reported similar model performances with CT radiomics based on GTV and planning target volume (PTV).

Even though we focused on the original segmentations rather than their expansion, our preliminary analyses may support both the observation of Fontaine et al. [34] and the peritumoral importance hypothesis. Specifically, the best TRAD-signature reported a borderline significant association with the solid pattern of the tumor (p = 0.053), while the best GTV signature with the perineural invasion (p = 0.056). Hence, we could hypothesize that the TRAD-signature derives relevant predictive information from the texture of pure pathological tissue (solid pattern) while the GTV-signature captures the broader environment (perineural spread). Additionally, Fontaine et al. [34] limited their analysis to conventional survival models without exploring more complex machine learning approaches common in the radiomics field.

Indeed, complex AI models are better suited to capture non-linear relationships that may likely occur between predictors (such as radiomics features) and outcome [58,59]. Radiomics features, through lesion shape, texture, and image intensities, are supposed to reflect complex biological characteristics of the diseased tissue [60] that likely hold no linear connection to the outcome. Under this hypothesis, the predictive potential of the radiomics signature may be fully leveraged with complex non-parametric and non-linear models. Consistently, the literature reports superior performances for complex models compared to Cox-ph models in radiomics studies [61,62]. Indeed, when we translated the RF model predictions into a time-to-event analysis framework, both the GTV and TRAD signatures successfully stratified patients into significantly distinct risk groups for progression-free survival (Log-rank p < 0.0001). Furthermore, Cox-ph modeling confirmed robust prognostic accuracy for both the GTV (C-index = 0.74; HR = 11.63) and TRAD (C-index = 0.72; HR = 7.01) signatures. These findings, combined with the different experimental design of Fontaine et al.’s study [34], including the different tumor histology and endpoint, may explain the different conclusions reached in this study.

The high performance of the GTV-based signature likely reflects the infiltrative nature of ACCs, which frequently exhibit perineural invasion beyond the primary mass [63]. By capturing a broader contour, the GTV model effectively incorporates these prognostic areas, despite HU-based thresholding (following IBSI [19], Fontaine et al. [34], and Bogowicz et al. [64]) to reduce noise from surrounding healthy tissues and address partial volume effects. Although DeLong’s test reported no statistical differences between the two competitive models, it may lack sufficient sensitivity to detect true differences due to the study’s small sample size. Nonetheless, the 95% BCa CI of ROC AUCs for these models showed substantial overlap, and their median absolute differences were near zero, providing additional indications supporting the absence of meaningful difference in predictive performance. Note that bootstrap analysis is generally considered more robust in small-sample settings [65,66,67].

In our analyses, the signature was developed using a sequential backward elimination approach combined with three popular choices of classifiers in radiomics—LR, L-SVM, and RF [53]. Among these, the RF-derived signature demonstrated the best performance in predicting ACC progression after PT, independent of the changes in tumor delineation (GTV vs. TRAD). This is not surprising because RF is known to achieve high performance, often outperforming conventional methods in tasks like tumor classification, prognosis prediction, and treatment response evaluation [12,68,69]. Its strength stems from aggregating predictions from several uncorrelated decision trees constructed through subsampling or bagging of data and feature space, thereby accounting for potential data and feature variability, ultimately yielding better predictive performance while reducing the risks of overfitting [70]. In a cohort of head and neck patients, Peng et al. [71] investigated the prognostic effectiveness of a multiplicity of machine learning models compared to traditional statistical approaches, reporting that the RF model outperformed Cox-ph, regardless of the lesion site.

The feature compositions of signatures identified in our study varied substantially across models and VOIs. Although most selected features were texture-based, they originated from different texture classes (GLCM, GLDM, etc.). GTV-derived signatures included additional shape features (e.g., Flatness and MajorAxisLength), whereas TRAD mostly preferred additional first-order features (e.g., Median and InterquartileRange). Consistent with these findings, the signatures demonstrated low to moderate correlations, indicating limited overlap across feature selection results obtained with different models. This variability aligns with the findings by Demircioğlu et al. [53], who reported that radiomics signatures are highly dependent on the underlying FS method. They further observed that models with statistically similar predictive performance may rely on different, yet potentially correlated features [53]. Indeed, in our study, the best performing GTV and TRAD signatures also achieved statistically comparable performance and demonstrated nearly 50% similarity.

Even though radiomics signatures are intrinsically specific to the clinical endpoint and the pathology under investigation, texture and shape features were found to be associated with tumor aggressiveness in multiple histologies [32,72,73]. For example, in high-grade meningioma patients, shape features such as Flatness and MajorAxisLength were among the top-10 most important features in an RF model designed to predict the risk class according to integrated molecular-morphologic classification [32]. On the other hand, texture features, commonly associated with tumor heterogeneity, are well known among several radiomics models developed to predict prognosis [73] and characterize tumor aggressiveness [72,74]. However, to date, there is no evidence of a radiomics signature predictive of ACC progression after PT.

To the best of our knowledge, despite the small dataset, this is the first study investigating radiomics models to predict ACC progression after PT. However, the utility of radiomics in improving the relapse-free survival prediction after surgery and adjuvant-RT in the same histological type was previously reported in a recent literature study [75]. In our analysis, given the low incidence of ACC, alluded to a small sample size, methodologically rigorous approaches were adopted to optimize the reliability of our conclusions. Specifically, 95% confidence intervals were estimated using the BCa bootstrap method [51], and permutation tests [16,76] were performed to evaluate the statistical significance of the 10-fold cross-validated AUC, thus quantifying the reliability of the measured metrics. Unlike parametric tests, the permutation test is a non-parametric test, with no prior assumptions that may be violated when used with a small sample size [76]. Moreover, it provides the exact p-value by constructing a null distribution directly from the observed data, with exact control over the Type I error rate. Such a statistical approach has been investigated in pre-clinical [76]/pilot [77] studies and was recently suggested as a valuable tool to deal with small sample sizes [76]. Despite the methodological rigor, it is worth noting that the development of a radiomics signature for ACC progression after PT was beyond the scope of this study. Indeed, for this purpose, a larger dataset and predictors other than CT radiomics features (e.g., multi-modal imaging, clinical, treatment planning, deep features) should be considered.

Our results encourage the use of GTV for developing radiomics models aimed at predicting ACC progression after PT. However, future efforts toward clinically accessible radiomics should evaluate automatic contouring for its intrinsically standardization capabilities coupled with a significant decrease in time and resource consumption [78]. Nonetheless, the relative rarity and morphological peculiarity of such a tumor type [79] may hinder the development of robust auto-segmentation tools for ACC.

Finally, we would like to point out some of the main limitations of our study. First, although the retrospective, single-center design ensured high technical homogeneity in imaging protocols and proton therapy delivery, it inherently carries risks of selection bias and variations in clinical follow-up, limiting the immediate generalizability of our findings. Even though we applied strict inclusion criteria and standardized our radiomics extraction pipeline to ensure data integrity, a prospective study design should be considered as a future direction. Second, while this cohort was appropriate for preliminary findings, the relatively small sample size (n = 56) may reduce the statistical power to detect more subtle differences between the GTV and TRAD radiomics models. Moreover, the number of observed events (n = 20) necessitated a conservative approach to feature selection. By restricting our models to a maximum of three features, we maintained a responsible events-per-variable ratio to prevent overfitting [48]; although this may not fully capture the complete spectrum of tumor heterogeneity, it reduces Type I error. Third, while CT is the standard for proton therapy planning and benchmark studies (e.g., Fontaine et al. [34]), the exclusion of other modalities like MRI or PET means that potentially synergistic biological features are omitted. Fourth, although the use of 10-fold cross-validation and rigorous statistical analysis establishes internal robustness, the lack of external validation renders our findings hypothesis-generating. Future validation in larger, multi-institutional cohorts is necessary to confirm these findings across diverse scanner types and clinical workflows. Ultimately, because this study focused on a specific lesion, treatment, and clinical endpoint, our conclusions could not be directly extended to other contexts.

5. Conclusions

In our study, GTV-based CT radiomics models for predicting ACC progression after PT reported comparable predictive and prognostic performances to models developed with dedicated lesion segmentation for radiomics analysis. In this framework, GTV showed the potential to represent a valuable, easy-to-standardize, and less time- and resource-consuming lesion segmentation strategy compared to dedicated contours. Comprehensive studies with larger datasets and external validation are warranted to confirm our findings.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/technologies14030144/s1, Figure S1: Bootstrap distribution of 10-fold cross-validated AUC used to estimate the 95% BCa confidence interval for each combination of classifier (LR, L-SVM, and RF) and VOI segmentation (GTV and TRAD). Bootstrap sampling with replacement was repeated 10,000 times; Figure S2: Null distribution derived from the permutation test of the 10-fold cross-validated AUC for each combination of the classifier (LR, L-SVM, and RF) and VOI segmentation (GTV and TRAD). Class labels were randomly permuted 1000 times, and the AUC was recalculated for each permutation to construct the null distribution; Figure S3: Progression-free survival Kaplan–Meier curves for the GTV model (on the left) and the TRAD model (on the right). Youden Index cut-offs, derived from out-of-fold probabilities, were used to stratify low- and high-risk groups; Table S1: ICC(1, 1) estimates (95% CI) of radiomic features selected by at least one classifier for the target VOI (GTV vs. TRAD); Table S2: Summary of the best GTV- and TRAD-models’ out-of-fold probabilities and their signature values, with respect to the T-stage. The Mann-Whitney U test was used to evaluate the statistical association between the models’ features and the T-stage; Table S3: Summary of the best GTV- and TRAD-models’ out-of-fold probabilities and their signature values, with respect to the solid pattern. The Mann-Whitney U test was used to evaluate the statistical association between the models’ features and the solid pattern; Table S4: Summary of the best GTV and TRAD models out-of-fold probabilities and their signature values, with respect to the perineural spread. The Mann-Whitney U test was used to evaluate the statistical association between the models’ features and the perineural spread.

Author Contributions

Conceptualization, G.F., S.T.S., and E.O.; methodology, G.F. and S.T.S.; software, G.F. and S.T.S.; formal analysis, G.F. and S.T.S.; investigation, G.F., S.T.S., and E.O.; resources, E.O.; data curation, G.F. and S.T.S.; writing—original draft preparation, G.F. and S.T.S.; writing—review and editing, G.F., S.T.S., L.L., M.B., C.F., L.T., B.V., V.D., S.M., S.I., and E.O.; visualization, G.F. and S.T.S.; supervision, E.O.; project administration, E.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the National Center for Oncological Hadrontherapy’s referral Ethical Committee (protocol code: CNAO OSS 34 2021 and date of approval: 2021).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data will be shared upon reasonable request; however, the extracted radiomics features have been made publicly available at https://github.com/sithin-cnao/ACCRadiomics_CT.git, accessed on 23 February 2026.

Acknowledgments

The authors wish to express their sincere gratitude to Giulio Di Ciaccia for his valuable contributions during the early stages of this project.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACC	Adenoid cystic carcinoma
HNSCC	Head and neck squamous cell carcinoma
EMR	Electronic medical records
RT	Radiotherapy
PT	Proton therapy
IGRT	Image-guided radiotherapy
GTV	Gross tumor volume
PTV	Planning target volume
OAR	Organ at risk
TRAD	Tumor segmentation for radiomics
VOI	Volume-of-interest
CT	Computed tomography
HU	Hounsfield unit
MRI	Magnetic resonance imaging
T2w	T2-weighted MR
T1w-ce	T1-weighted contrast-enhanced MR
PACS	Picture archiving and communication system
AI	Artificial intelligence
GLCM	Gray level co-occurrence matrix
GLSZM	Gray level size zone matrix
GLDM	Gray level dependence matrix
NGTDM	Neighborhood gray tone difference matrix
MCC	Maximal correlation coefficient
Imc	Informational measure of correlation
LR	Logistic regression
L-SVM	Linear support vector machine
RF	Random forest
SBS	Sequential backward selection
ROC	Receiver operating characteristic
AUC	Area under the ROC curve
CI	Confidence interval
BCa	Bias-corrected accelerated

References

Dicuonzo, G.; Galeone, G.; Shini, M.; Massari, A. Towards the Use of Big Data in Healthcare: A Literature Review. Healthcare 2022, 10, 1232. [Google Scholar] [CrossRef]
Senbekov, M.; Saliev, T.; Bukeyeva, Z.; Almabayeva, A.; Zhanaliyeva, M.; Aitenova, N.; Toishibekov, Y.; Fakhradiyev, I. The Recent Progress and Applications of Digital Technologies in Healthcare: A Review. Int. J. Telemed. Appl. 2020, 2020, 8830200. [Google Scholar] [CrossRef]
Stafie, C.S.; Sufaru, I.-G.; Ghiciuc, C.M.; Stafie, I.-I.; Sufaru, E.-C.; Solomon, S.M.; Hancianu, M. Exploring the Intersection of Artificial Intelligence and Clinical Healthcare: A Multidisciplinary Review. Diagnostics 2023, 13, 1995. [Google Scholar] [CrossRef]
Fountzilas, E.; Pearce, T.; Baysal, M.A.; Chakraborty, A.; Tsimberidou, A.M. Convergence of Evolving Artificial Intelligence and Machine Learning Techniques in Precision Oncology. npj Digit. Med. 2025, 8, 75. [Google Scholar] [CrossRef] [PubMed]
Jaffray, D.A.; Knaul, F.; Baumann, M.; Gospodarowicz, M. Harnessing Progress in Radiotherapy for Global Cancer Control. Nat. Cancer 2023, 4, 1228–1238. [Google Scholar] [CrossRef]
De Benedictis, A.; Lettieri, E.; Gastaldi, L.; Masella, C.; Urgu, A.; Tartaglini, D. Electronic Medical Records Implementation in Hospital: An Empirical Investigation of Individual and Organizational Determinants. PLoS ONE 2020, 15, e0234108. [Google Scholar] [CrossRef]
Tornero Costa, R.; Adib, K.; Salama, N.; Davia, S.; Martínez Millana, A.; Traver, V.; Davtyan, K. Electronic Health Records and Data Exchange in the WHO European Region: A Subregional Analysis of Achievements, Challenges, and Prospects. Int. J. Med. Inform. 2025, 194, 105687. [Google Scholar] [CrossRef]
Mansoori, B.; Erhard, K.K.; Sunshine, J.L. Picture Archiving and Communication System (PACS) Implementation, Integration & Benefits in an Integrated Health System. Acad. Radiol. 2012, 19, 229–235. [Google Scholar] [CrossRef] [PubMed]
Lambin, P.; Rios-Velazquez, E.; Leijenaar, R.; Carvalho, S.; Van Stiphout, R.G.P.M.; Granton, P.; Zegers, C.M.L.; Gillies, R.; Boellard, R.; Dekker, A.; et al. Radiomics: Extracting More Information from Medical Images Using Advanced Feature Analysis. Eur. J. Cancer 2012, 48, 441–446. [Google Scholar] [CrossRef]
Zhang, F.; Bai, J.; Liu, B.; Yuan, M.; Fang, C.; Yang, G.; Qiao, Y. Development and Validation of a CT-Based Radiomics Nomogram for Predicting Cervical Lymph Node Metastasis in Papillary Thyroid Carcinoma. Cancer Biomark. 2025, 42, 18758592251322028. [Google Scholar] [CrossRef] [PubMed]
Zhong, X.; Salahuddin, Z.; Chen, Y.; Woodruff, H.C.; Long, H.; Peng, J.; Xie, X.; Lin, M.; Lambin, P. An Interpretable Radiomics Model Based on Two-Dimensional Shear Wave Elastography for Predicting Symptomatic Post-Hepatectomy Liver Failure in Patients with Hepatocellular Carcinoma. Cancers 2023, 15, 5303. [Google Scholar] [CrossRef]
Parmar, C.; Grossmann, P.; Bussink, J.; Lambin, P.; Aerts, H.J.W.L. Machine Learning Methods for Quantitative Radiomic Biomarkers. Sci. Rep. 2015, 5, 13087. [Google Scholar] [CrossRef]
Su, H.-Z.; Li, Z.-Y.; Hong, L.-C.; Wu, Y.-H.; Zhang, F.; Zhang, Z.-B.; Zhang, X.-D. Machine Learning Model for Diagnosing Salivary Gland Adenoid Cystic Carcinoma Based on Clinical and Ultrasound Features. Insights Imaging 2025, 16, 96. [Google Scholar] [CrossRef]
Lambin, P.; Leijenaar, R.T.H.; Deist, T.M.; Peerlings, J.; De Jong, E.E.C.; Van Timmeren, J.; Sanduleanu, S.; Larue, R.T.H.M.; Even, A.J.G.; Jochems, A.; et al. Radiomics: The Bridge between Medical Imaging and Personalized Medicine. Nat. Rev. Clin. Oncol. 2017, 14, 749–762. [Google Scholar] [CrossRef] [PubMed]
Lambin, P.; Woodruff, H.C.; Mali, S.A.; Zhong, X.; Kuang, S.; Lavrova, E.; Khan, H.; Lekadir, K.; Zwanenburg, A.; Deasy, J.; et al. Radiomics Quality Score 2.0: Towards Radiomics Readiness Levels and Clinical Translation for Personalized Medicine. Nat. Rev. Clin. Oncol. 2025, 22, 831–846. [Google Scholar] [CrossRef] [PubMed]
Dinga, R.; Penninx, B.W.J.H.; Veltman, D.J.; Schmaal, L.; Marquand, A.F. Beyond Accuracy: Measures for Assessing Machine Learning Models, Pitfalls and Guidelines. BioRxiv 2019. [Google Scholar] [CrossRef]
Demircioğlu, A. Are Deep Models in Radiomics Performing Better than Generic Models? A Systematic Review. Eur. Radiol. Exp. 2023, 7, 11. [Google Scholar] [CrossRef]
Beddok, A.; Orlhac, F.; Rozenblum, L.; Calugaru, V.; Créhange, G.; Dercle, L.; Nioche, C.; Thariat, J.; Marin, T.; El Fakhri, G.; et al. Radiomics-Driven Personalized Radiotherapy for Primary and Recurrent Tumors: A General Review with a Focus on Reirradiation. Cancer/Radiothérapie 2024, 28, 597–602. [Google Scholar] [CrossRef] [PubMed]
Zwanenburg, A.; Leger, S.; Vallières, M.; Löck, S. Image Biomarker Standardisation Initiative. Radiology 2020, 295, 328–338. [Google Scholar] [CrossRef]
Geraghty, B.J.; Dasgupta, A.; Sandhu, M.; Malik, N.; Maralani, P.J.; Detsky, J.; Tseng, C.-L.; Soliman, H.; Myrehaug, S.; Husain, Z.; et al. Predicting Survival in Patients with Glioblastoma Using MRI Radiomic Features Extracted from Radiation Planning Volumes. J. Neurooncol. 2022, 156, 579–588. [Google Scholar] [CrossRef]
Fh, T.; Cyw, C.; Eyw, C. Radiomics AI Prediction for Head and Neck Squamous Cell Carcinoma (HNSCC) Prognosis and Recurrence with Target Volume Approach. BJR|Open 2021, 3, 20200073. [Google Scholar] [CrossRef]
Pistel, M.; Brock, L.; Laun, F.B.; Erber, R.; Weiland, E.; Uder, M.; Wenkel, E.; Ohlmeyer, S.; Bickelhaupt, S. Stability of Radiomic Features against Variations in Lesion Segmentations Computed on Apparent Diffusion Coefficient Maps of Breast Lesions. Diagnostics 2024, 14, 1427. [Google Scholar] [CrossRef] [PubMed]
Cama, I.; Candiani, V.; Roccatagliata, L.; Fiaschi, P.; Rebella, G.; Resaz, M.; Piana, M.; Campi, C. Segmentation Agreement and the Reliability of Radiomics Features. Adv. Comput. Sci. Eng. 2023, 1, 202–217. [Google Scholar] [CrossRef]
Thulasi Seetha, S.; Garanzini, E.; Tenconi, C.; Marenghi, C.; Avuzzi, B.; Catanzaro, M.; Stagni, S.; Villa, S.; Chiorda, B.N.; Badenchini, F.; et al. Stability of Multi-Parametric Prostate MRI Radiomic Features to Variations in Segmentation. J. Pers. Med. 2023, 13, 1172. [Google Scholar] [CrossRef]
Liu, R.; Elhalawani, H.; Radwan Mohamed, A.S.; Elgohari, B.; Court, L.; Zhu, H.; Fuller, C.D. Stability Analysis of CT Radiomic Features with Respect to Segmentation Variation in Oropharyngeal Cancer. Clin. Transl. Radiat. Oncol. 2020, 21, 11–18. [Google Scholar] [CrossRef] [PubMed]
Lin, D.; Lapen, K.; Sherer, M.V.; Kantor, J.; Zhang, Z.; Boyce, L.M.; Bosch, W.; Korenstein, D.; Gillespie, E.F. A Systematic Review of Contouring Guidelines in Radiation Oncology: Analysis of Frequency, Methodology, and Delivery of Consensus Recommendations. Int. J. Radiat. Oncol. Biol. Phys. 2020, 107, 827–835. [Google Scholar] [CrossRef] [PubMed]
Bibault, J.-E.; Giraud, P. Deep Learning for Automated Segmentation in Radiotherapy: A Narrative Review. Br. J. Radiol. 2024, 97, 13–20. [Google Scholar] [CrossRef]
Korte, J.C.; Hardcastle, N.; Ng, S.P.; Clark, B.; Kron, T.; Jackson, P. Cascaded Deep Learning-based Auto-segmentation for Head and Neck Cancer Patients: Organs at Risk on T2-weighted Magnetic Resonance Imaging. Med. Phys. 2021, 48, 7757–7772. [Google Scholar] [CrossRef]
Lustberg, T.; Van Soest, J.; Gooding, M.; Peressutti, D.; Aljabar, P.; Van Der Stoep, J.; Van Elmpt, W.; Dekker, A. Clinical Evaluation of Atlas and Deep Learning Based Automatic Contouring for Lung Cancer. Radiother. Oncol. 2018, 126, 312–317. [Google Scholar] [CrossRef]
Lin, H.; Xiao, H.; Dong, L.; Teo, K.B.-K.; Zou, W.; Cai, J.; Li, T. Deep Learning for Automatic Target Volume Segmentation in Radiation Therapy: A Review. Quant. Imaging Med. Surg. 2021, 11, 4847–4858. [Google Scholar] [CrossRef]
Kocher, M.; Ruge, M.I.; Galldiks, N.; Lohmann, P. Applications of Radiomics and Machine Learning for Radiotherapy of Malignant Brain Tumors. Strahlenther. Onkol. 2020, 196, 856–867. [Google Scholar] [CrossRef]
Kertels, O.; Delbridge, C.; Sahm, F.; Ehret, F.; Acker, G.; Capper, D.; Peeken, J.C.; Diehl, C.; Griessmair, M.; Metz, M.-C.; et al. Imaging Meningioma Biology: Machine Learning Predicts Integrated Risk Score in WHO Grade 2/3 Meningioma. Neuro-Oncol. Adv. 2024, 6, vdae080. [Google Scholar] [CrossRef]
Isensee, F.; Jaeger, P.F.; Full, P.M.; Vollmuth, P.; Maier-Hein, K.H. nnU-Net for Brain Tumor Segmentation 2020. In International MICCAI Brainlesion Workshop; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
Fontaine, P.; Andrearczyk, V.; Oreiller, V.; Abler, D.; Castelli, J.; Acosta, O.; De Crevoisier, R.; Vallières, M.; Jreige, M.; Prior, J.O.; et al. Cleaning Radiotherapy Contours for Radiomics Studies, Is It Worth It? A Head and Neck Cancer Study. Clin. Transl. Radiat. Oncol. 2022, 33, 153–158. [Google Scholar] [CrossRef]
Forghani, R.; Savadjiev, P.; Chatterjee, A.; Muthukrishnan, N.; Reinhold, C.; Forghani, B. Radiomics and Artificial Intelligence for Biomarker and Prediction Model Development in Oncology. Comput. Struct. Biotechnol. J. 2019, 17, 995–1008. [Google Scholar] [CrossRef] [PubMed]
Kraan, A.C.; Del Guerra, A. Technological Developments and Future Perspectives in Particle Therapy: A Topical Review. IEEE Trans. Radiat. Plasma Med. Sci. 2024, 8, 453–481. [Google Scholar] [CrossRef]
Chen, Y.H.; Blommestein, H.M.; Klazenga, R.; Uyl-de Groot, C.; Van Vulpen, M. Costs of Newly Funded Proton Therapy Using Time-Driven Activity-Based Costing in The Netherlands. Cancers 2023, 15, 516. [Google Scholar] [CrossRef]
Vischioni, B.; Bonora, M.; Fontana, G.; Scardo, S.; Brighenti, L.; D’Ambrosio, L.; Ronchi, S.; Ingargiola, R.; Camarda, A.M.; Imparato, S.; et al. Prognostic Factors and Clinical Outcomes in a Large Cohort of Head and Neck Adenoid Cystic Carcinoma Patients Treated with Proton Beam Therapy: Insights from an Italian Referral Center. Radiother. Oncol. 2025, 213, 111143. [Google Scholar] [CrossRef]
Lv, W.; Feng, H.; Du, D.; Ma, J.; Lu, L. Complementary Value of Intra- and Peri-Tumoral PET/CT Radiomics for Outcome Prediction in Head and Neck Cancer. IEEE Access 2021, 9, 81818–81827. [Google Scholar] [CrossRef]
Leger, S.; Zwanenburg, A.; Pilz, K.; Zschaeck, S.; Zöphel, K.; Kotzerke, J.; Schreiber, A.; Zips, D.; Krause, M.; Baumann, M.; et al. CT Imaging during Treatment Improves Radiomic Models for Patients with Locally Advanced Head and Neck Cancer. Radiother. Oncol. 2019, 130, 10–17. [Google Scholar] [CrossRef]
Van Griethuysen, J.J.M.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; Beets-Tan, R.G.H.; Fillion-Robin, J.-C.; Pieper, S.; Aerts, H.J.W.L. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017, 77, e104–e107. [Google Scholar] [CrossRef]
Bailly, C.; Bodet-Milin, C.; Couespel, S.; Necib, H.; Kraeber-Bodéré, F.; Ansquer, C.; Carlier, T. Revisiting the Robustness of PET-Based Textural Features in the Context of Multi-Centric Trials. PLoS ONE 2016, 11, e0159984. [Google Scholar] [CrossRef] [PubMed]
Shafiq-ul-Hassan, M.; Zhang, G.G.; Latifi, K.; Ullah, G.; Hunt, D.C.; Balagurunathan, Y.; Abdalah, M.A.; Schabath, M.B.; Goldgof, D.G.; Mackin, D.; et al. Intrinsic Dependencies of CT Radiomic Features on Voxel Size and Number of Gray Levels. Med. Phys. 2017, 44, 1050–1062. [Google Scholar] [CrossRef]
Larue, R.T.H.M.; Van Timmeren, J.E.; De Jong, E.E.C.; Feliciani, G.; Leijenaar, R.T.H.; Schreurs, W.M.J.; Sosef, M.N.; Raat, F.H.P.J.; Van Der Zande, F.H.R.; Das, M.; et al. Influence of Gray Level Discretization on Radiomic Feature Stability for Different CT Scanners, Tube Currents and Slice Thicknesses: A Comprehensive Phantom Study. Acta Oncol. 2017, 56, 1544–1553. [Google Scholar] [CrossRef]
Coroller, T.P.; Agrawal, V.; Narayan, V.; Hou, Y.; Grossmann, P.; Lee, S.W.; Mak, R.H.; Aerts, H.J.W.L. Radiomic Phenotype Features Predict Pathological Response in Non-Small Cell Lung Cancer. Radiother. Oncol. 2016, 119, 480–486. [Google Scholar] [CrossRef] [PubMed]
Vittinghoff, E.; McCulloch, C.E. Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression. Am. J. Epidemiol. 2007, 165, 710–718. [Google Scholar] [CrossRef] [PubMed]
Kaufman, S.; Rosset, S.; Perlich, C.; Stitelman, O. Leakage in Data Mining: Formulation, Detection, and Avoidance. ACM Trans. Knowl. Discov. Data 2012, 6, 1–21. [Google Scholar] [CrossRef]
Peduzzi, P.; Concato, J.; Kemper, E.; Holford, T.R.; Feinstein, A.R. A Simulation Study of the Number of Events per Variable in Logistic Regression Analysis. J. Clin. Epidemiol. 1996, 49, 1373–1379. [Google Scholar] [CrossRef]
Teng, X.; Zhang, J.; Ma, Z.; Zhang, Y.; Lam, S.; Li, W.; Xiao, H.; Li, T.; Li, B.; Zhou, T.; et al. Improving Radiomic Model Reliability Using Robust Features from Perturbations for Head-and-Neck Carcinoma. Front. Oncol. 2022, 12, 974467. [Google Scholar] [CrossRef]
Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef]
Farag, K.A.; Asselhab, M.A.; Binsoud, B.M.; Abobaker, Z.M. Performance Comparison of Traditional Bootstrap and Bias-Corrected and Accelerated Methods in Constructing Confidence Intervals for Non-Normal Data: A Simulation Study. Libyan J. Med. Appl. Sci. 2025, 3, 115–120. [Google Scholar] [CrossRef]
DeLong, E.R.; DeLong, D.M.; Clarke-Pearson, D.L. Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988, 44, 837–845. [Google Scholar] [CrossRef] [PubMed]
Demircioğlu, A. Evaluation of the Dependence of Radiomic Features on the Machine Learning Model. Insights Imaging 2022, 13, 28. [Google Scholar] [CrossRef]
Kwok, S. A Faster Algorithm for Maximum Weight Matching on Unrestricted Bipartite Graphs. arXiv 2025, arXiv:2502.20889. [Google Scholar]
Fluss, R.; Faraggi, D.; Reiser, B. Estimation of the Youden Index and Its Associated Cutoff Point. Biometrical J. 2005, 47, 458–472. [Google Scholar] [CrossRef]
Grégoire, V.; Ang, K.; Budach, W.; Grau, C.; Hamoir, M.; Langendijk, J.A.; Lee, A.; Le, Q.-T.; Maingon, P.; Nutting, C.; et al. Delineation of the Neck Node Levels for Head and Neck Tumors: A 2013 Update. DAHANCA, EORTC, HKNPCSG, NCIC CTG, NCRI, RTOG, TROG Consensus Guidelines. Radiother. Oncol. 2014, 110, 172–181. [Google Scholar] [CrossRef]
Keek, S.; Sanduleanu, S.; Wesseling, F.; De Roest, R.; Van Den Brekel, M.; Van Der Heijden, M.; Vens, C.; Giuseppina, C.; Licitra, L.; Scheckenbach, K.; et al. Computed Tomography-Derived Radiomic Signature of Head and Neck Squamous Cell Carcinoma (Peri)Tumoral Tissue for the Prediction of Locoregional Recurrence and Distant Metastasis after Concurrent Chemo-Radiotherapy. PLoS ONE 2020, 15, e0232639. [Google Scholar] [CrossRef]
Huynh, B.N.; Groendahl, A.R.; Tomic, O.; Liland, K.H.; Knudtsen, I.S.; Hoebers, F.; Van Elmpt, W.; Malinen, E.; Dale, E.; Futsaether, C.M. Head and Neck Cancer Treatment Outcome Prediction: A Comparison between Machine Learning with Conventional Radiomics Features and Deep Learning Radiomics. Front. Med. 2023, 10, 1217037. [Google Scholar] [CrossRef]
Grigorescu, I.; Mushari, N.A.; Tsoumpas, C.; Deprez, M. AI Methods: Understanding AI Models, Radiomic Analysis and Performance Metrics in Medical Imaging. In Artificial Intelligence for Radiographers; Malamateniou, C., Hardy, M., M. Knapp, K., Ramlaul, A., Eds.; Springer Nature: Cham, Switzerland, 2026; pp. 9–35. ISBN 978-3-032-05079-3. [Google Scholar]
Aerts, H.J.W.L.; Velazquez, E.R.; Leijenaar, R.T.H.; Parmar, C.; Grossmann, P.; Carvalho, S.; Bussink, J.; Monshouwer, R.; Haibe-Kains, B.; Rietveld, D.; et al. Decoding Tumour Phenotype by Noninvasive Imaging Using a Quantitative Radiomics Approach. Nat. Commun. 2014, 5, 4006. [Google Scholar] [CrossRef]
Volpe, S.; Isaksson, L.J.; Zaffaroni, M.; Pepa, M.; Raimondi, S.; Botta, F.; Presti, G.L.; Vincini, M.G.; Rampinelli, C.; Cremonesi, M.; et al. Impact of Image Filtering and Assessment of Volume-Confounding Effects on CT Radiomic Features and Derived Survival Models in Non-Small Cell Lung Cancer. Transl. Lung Cancer Res. 2022, 11, 2452–2463. [Google Scholar] [CrossRef]
Zhang, D.; Luan, J.; Liu, B.; Yang, A.; Lv, K.; Hu, P.; Han, X.; Yu, H.; Shmuel, A.; Ma, G.; et al. Comparison of MRI Radiomics-Based Machine Learning Survival Models in Predicting Prognosis of Glioblastoma Multiforme. Front. Med. 2023, 10, 1271687. [Google Scholar] [CrossRef]
Dantas, A.N.; De Morais, E.F.; Macedo, R.A.D.P.; Tinôco, J.M.D.L.; Morais, M.D.L.S.D.A. Clinicopathological Characteristics and Perineural Invasion in Adenoid Cystic Carcinoma: A Systematic Review. Braz. J. Otorhinolaryngol. 2015, 81, 329–335. [Google Scholar] [CrossRef] [PubMed]
Bogowicz, M.; Riesterer, O.; Stark, L.S.; Studer, G.; Unkelbach, J.; Guckenberger, M.; Tanadini-Lang, S. Comparison of PET and CT Radiomics for Prediction of Local Tumor Control in Head and Neck Squamous Cell Carcinoma. Acta Oncol. 2017, 56, 1531–1536. [Google Scholar] [CrossRef] [PubMed]
An, C.; Park, Y.W.; Ahn, S.S.; Han, K.; Kim, H.; Lee, S.-K. Radiomics Machine Learning Study with a Small Sample Size: Single Random Training-Test Set Split May Lead to Unreliable Results. PLoS ONE 2021, 16, e0256152. [Google Scholar] [CrossRef]
Carpenter, J.; Bithell, J. Bootstrap Confidence Intervals: When, Which, What? A Practical Guide for Medical Statisticians. Statist. Med. 2000, 19, 1141–1164. [Google Scholar] [CrossRef]
Cevenini, G.; Barbini, P. A Bootstrap Approach for Assessing the Uncertainty of Outcome Probabilities When Using a Scoring System. BMC Med. Inform. Decis. Mak. 2010, 10, 45. [Google Scholar] [CrossRef]
Yuruk, Y.Y. Uncover This Tech Term: Random Forest. Korean J. Radiol. 2025, 26, 998. [Google Scholar] [CrossRef]
Mohamadi, Z.; Shafizadeh, A.; Aliyan, Y.; Shayesteh, S.F.; Goudarzi, P.; Khodabandeh, A.; Vaghari, A.; Ashrafi, H.; Bahrami, O.; ZarinKhat, A.; et al. The Application of Random Forest-Based Models in Prognostication of Gastrointestinal Tract Malignancies: A Systematic Review. Front. Artif. Intell. 2025, 8, 1517670. [Google Scholar] [CrossRef]
Mienye, I.D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
Peng, J.; Lu, Y.; Chen, L.; Qiu, K.; Chen, F.; Liu, J.; Xu, W.; Zhang, W.; Zhao, Y.; Yu, Z.; et al. The Prognostic Value of Machine Learning Techniques versus Cox Regression Model for Head and Neck Cancer. Methods 2022, 205, 123–132. [Google Scholar] [CrossRef]
Yap, F.Y.; Varghese, B.A.; Cen, S.Y.; Hwang, D.H.; Lei, X.; Desai, B.; Lau, C.; Yang, L.L.; Fullenkamp, A.J.; Hajian, S.; et al. Shape and Texture-Based Radiomics Signature on CT Effectively Discriminates Benign from Malignant Renal Masses. Eur. Radiol. 2021, 31, 1011–1021. [Google Scholar] [CrossRef] [PubMed]
Ling, X.; Bazyar, S.; Ferris, M.; Molitoris, J.; Allor, E.; Thomas, H.; Arons, D.; Schumaker, L.; Krc, R.; Mendes, W.S.; et al. Identification of CT Based Radiomic Biomarkers for Progression Free Survival in Head and Neck Squamous Cell Carcinoma. Sci. Rep. 2025, 15, 1279. [Google Scholar] [CrossRef]
Shu, J.; Tang, Y.; Cui, J.; Yang, R.; Meng, X.; Cai, Z.; Zhang, J.; Xu, W.; Wen, D.; Yin, H. Clear Cell Renal Cell Carcinoma: CT-Based Radiomics Features for the Prediction of Fuhrman Grade. Eur. J. Radiol. 2018, 109, 8–12. [Google Scholar] [CrossRef] [PubMed]
Rondi, P.; Tomasoni, M.; Cunha, B.; Rampinelli, V.; Bossi, P.; Guerini, A.; Lombardi, D.; Borghesi, A.; Magrini, S.M.; Buglione, M.; et al. Radiomic and Clinical Model in the Prognostic Evaluation of Adenoid Cystic Carcinoma of the Head and Neck. Cancers 2024, 16, 3926. [Google Scholar] [CrossRef]
Unseld, T.; Ruckerbauer, L.; Mayer, B. Permutation Tests Are a Useful Alternative Approach for Statistical Hypothesis Testing in Small Sample Sizes. Altern. Lab. Anim. 2025, 53, 130–137. [Google Scholar] [CrossRef]
Robustelli Test, A.; Bortolotto, C.; Thulasi Seetha, S.; Marrocco, A.; Pairazzi, C.; Messana, G.; Brizzi, L.; Zacà, D.; Grimm, R.; Brero, F.; et al. Multisequence MRI-Driven Assessment of PD-L1 Expression in Non-Small Cell Lung Cancer: A Pilot Study. Biomed. Phys. Eng. Express 2026, 12, 015019. [Google Scholar] [CrossRef]
Doolan, P.J.; Charalambous, S.; Roussakis, Y.; Leczynski, A.; Peratikou, M.; Benjamin, M.; Ferentinos, K.; Strouthos, I.; Zamboglou, C.; Karagiannis, E. A Clinical Evaluation of the Performance of Five Commercial Artificial Intelligence Contouring Systems for Radiotherapy. Front. Oncol. 2023, 13, 1213068. [Google Scholar] [CrossRef]
Gondivkar, S.M.; Gadbail, A.R.; Chole, R.; Parikh, R.V. Adenoid Cystic Carcinoma: A Rare Clinical Entity and Literature Review. Oral Oncol. 2011, 47, 231–236. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Axial CT slice of a sample adenoid cystic carcinoma showing the segmentations: Original GTV, Cleaned GTV, and TRAD, highlighted in red. Original GTV was cleaned to include voxels within the soft tissue range (Hounsfield unit (HU): [−200, +600]). The blue box emphasizes the segmentation approaches under investigation. GTV: Gross tumor volume; TRAD: Tumor segmentation for radiomics.

Figure 2. ROC curves of radiomics signatures derived from GTV and TRAD segmentations on CT volumes, shown for Logistic regression (LR), Linear support vector machine (L-SVM), and Random forest (RF) classifiers. The dotted line represents random-chance performance. GTV: Gross tumor volume, TRAD: Tumor segmentation for radiomics. * p-value associated with DeLong’s Test.

Table 1. Summary of the key clinical and treatment characteristics of the study population. Median (interquartile range) and frequency (percentage) are provided for numeric and categorical variables, respectively. N.A.: Not available.

		Relapse
	Overall, N = 56 ¹	No, N = 36 ¹	Yes, N = 20 ¹	p-Value ²
Age [years-old]	61.2 (53.4, 70.1)	60.8 (53.8, 68.3)	62.0 (53.4, 70.8)	0.741
Gender				0.903
Male	23 (41%)	15 (42%)	8 (40%)
Female	33 (59%)	21 (58%)	12 (60%)
Solid Pattern				0.129
N.A.	8 (14%)	6 (17%)	2 (10%)
No	30 (54%)	22 (61%)	8 (40%)
Yes	18 (32%)	8 (22%)	10 (50%)
Perineural Spread				0.212
	36 (64%)	21 (58%)	15 (75%)
Lesion Site				0.078
Major Salivary Glands	10 (18%)	9 (25%)	1 (5.0%)
Minor Salivary Glands	46 (82%)	27 (75%)	19 (95%)
T-Stage				0.622
1	3 (5.4%)	3 (8.3%)	0 (0%)
2	2 (3.6%)	1 (2.8%)	1 (5.0%)
3	4 (7.1%)	2 (5.6%)	2 (10%)
4	47 (84%)	30 (83%)	17 (85%)
N-Stage				0.296
0	50 (89%)	30 (83%)	20 (100%)
1	4 (7.1%)	4 (11%)	0 (0%)
2	2 (3.6%)	2 (5.6%)	0 (0%)
M-Stage				0.655
0	50 (89%)	33 (92%)	17 (85%)
1	6 (11%)	3 (8.3%)	3 (15%)
GTV [cc]	31.6 (10.2, 84.9)	18.4 (6.8, 65.4)	88.7 (22.3, 102.3)	0.007
Total Dose [Gy(RBE)]	70.0 (70.0, 70.0)	70.0 (70.0, 70.0)	70.0 (70.0, 70.0)	0.638
Follow-up Time [months]	26.4 (18.5, 36.5)	29.4 (19.4, 38.9)	22.6 (15.9, 31.1)	0.181

¹ Median (IQR); n (%). ² Wilcoxon rank sum exact test; Pearson’s Chi-squared test; Fisher’s exact test; Wilcoxon rank sum test.

Table 2. Three-feature radiomics signatures selected for each combination of classifier (LR, L-SVM, and RF) and VOI (GTV and TRAD), along with AUC values and their absolute differences ∆ (here, ∆AUC = |AUC_GTV − AUC_TRAD|). Abbreviations are defined below the table.

Model	GTV		TRAD		Δ
Model	Signature	AUC [95% CI] (p-Value)	Signature	AUC [95% CI] (p-Value)	ΔAUC [IQR]
LR	‘glszm_SizeZoneNonUniformityNormalized’, ‘gldm_LargeDependenceHighGrayLevelEmphasis’, ‘gldm_DependenceVariance’	0.73 [0.52, 0.84] (0.011)	‘glcm_Imc1’, ‘firstorder_InterquartileRange’, ‘glcm_Imc2’	0.73 [0.55, 0.86] (0.015)	0.04 [0.01]
L-SVM	‘glcm_MCC’, ‘glcm_Imc1’, ‘glszm_GrayLevelVariance’	0.77 [0.45, 0.84] (0.008)	‘firstorder_Median’, ‘gldm_LargeDependenceLowGrayLevelEmphasis’, ‘gldm_DependenceEntropy’	0.80 [0.69, 0.93] (0.002)	0.05 [0.01]
RF	‘shape_Flatness’, ‘shape_MajorAxisLength’, ‘ngtdm_Busyness’	0.87 [0.69, 0.91] (0.001)	‘glcm_Imc1’, ‘shape_MajorAxisLength’ ‘glcm_MaximumProbability’,	0.80 [0.72, 0.91] (0.006)	0.00 [0.00]

LR: Logistic regression, L-SVM: Support vector machine with linear kernel, RF: Random forest classifier, VOI: Volume of Interest, GTV: Gross tumor volume, TRAD: Tumor segmentation for radiomics, GLCM: Gray level co-occurrence matrix, GLSZM: Gray level size zone matrix, GLDM: Gray level dependence matrix, NGTDM: Neighborhood gray tone difference matrix, Imc: Informational measure of correlation, MCC: Maximal correlation coefficient.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fontana, G.; Thulasi Seetha, S.; Levante, L.; Bonora, M.; Fichera, C.; Trombetta, L.; Vischioni, B.; Dolcetti, V.; Molinelli, S.; Imparato, S.; et al. Investigating the Radiomic Performance Gap Driven by Delineation Strategy: Radiotherapy Gross Tumor Volume vs. Dedicated Lesion Segmentation in Proton-Treated Adenoid Cystic Carcinoma. Technologies 2026, 14, 144. https://doi.org/10.3390/technologies14030144

AMA Style

Fontana G, Thulasi Seetha S, Levante L, Bonora M, Fichera C, Trombetta L, Vischioni B, Dolcetti V, Molinelli S, Imparato S, et al. Investigating the Radiomic Performance Gap Driven by Delineation Strategy: Radiotherapy Gross Tumor Volume vs. Dedicated Lesion Segmentation in Proton-Treated Adenoid Cystic Carcinoma. Technologies. 2026; 14(3):144. https://doi.org/10.3390/technologies14030144

Chicago/Turabian Style

Fontana, Giulia, Sithin Thulasi Seetha, Lorena Levante, Maria Bonora, Cristina Fichera, Luca Trombetta, Barbara Vischioni, Vincenzo Dolcetti, Silvia Molinelli, Sara Imparato, and et al. 2026. "Investigating the Radiomic Performance Gap Driven by Delineation Strategy: Radiotherapy Gross Tumor Volume vs. Dedicated Lesion Segmentation in Proton-Treated Adenoid Cystic Carcinoma" Technologies 14, no. 3: 144. https://doi.org/10.3390/technologies14030144

APA Style

Fontana, G., Thulasi Seetha, S., Levante, L., Bonora, M., Fichera, C., Trombetta, L., Vischioni, B., Dolcetti, V., Molinelli, S., Imparato, S., & Orlandi, E. (2026). Investigating the Radiomic Performance Gap Driven by Delineation Strategy: Radiotherapy Gross Tumor Volume vs. Dedicated Lesion Segmentation in Proton-Treated Adenoid Cystic Carcinoma. Technologies, 14(3), 144. https://doi.org/10.3390/technologies14030144

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Investigating the Radiomic Performance Gap Driven by Delineation Strategy: Radiotherapy Gross Tumor Volume vs. Dedicated Lesion Segmentation in Proton-Treated Adenoid Cystic Carcinoma

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Population

2.2. Clinical Data and Outcome Variable for Modeling

2.3. Imaging, Tumor Segmentations, and Labels

2.4. Radiomics Feature Extraction

2.5. Radiomics Modeling

2.6. Statistical Analysis

3. Results

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI