Predicting Local Failure after Partial Prostate Re-Irradiation Using a Dosiomic-Based Machine Learning Model

The aim of this study is to predict local failure after partial prostate re-irradiation for the treatment of isolated locally recurrent prostate cancer by using a machine learning classifier based on radiomic features from pre-treatment computed tomography (CT), positron-emission tomography (PET) and biological effective dose distribution (BED) of the radiotherapy plan. The analysis was conducted on a monocentric dataset of 43 patients with evidence of isolated intraprostatic recurrence of prostate cancer after primary external beam radiotherapy. All patients received partial prostate re-irradiation delivered by volumetric modulated arc therapy. The gross tumor volume (GTV) of each patient was manually contoured from planning CT, choline-PET and dose maps. An ensemble machine learning pipeline including unbalanced data correction and feature selection was trained using the radiomic and dosiomic features as input for predicting occurrence of local failure. The model performance was assessed using sensitivity, specificity, accuracy and area under receiver operating characteristic curves of the score function in 10-fold cross validation repeated 100 times. Local failure was observed in 13 patients (30%), with a median time to recurrence of 36.7 months (range = 6.1–102.4 months). A four variables ensemble machine learning model resulted in accuracy of 0.62 and AUC 0.65. According to our results, a dosiomic machine learning classifier can predict local failure after partial prostate re-irradiation.


Introduction
Prostate cancer (PCa), the fourth most occurring cancer overall and the second most frequent in men [1], is commonly treated with surgery or radiation therapy (RT), including brachytherapy and external beam radiotherapy (EBRT). Advanced EBRT techniques were associated with a relevant improvement in tumor control and reduced severe toxicity rate, in the last decade [2,3]. However, following radiation treatment, about half patients can develop biochemical recurrence (BCR) of disease diagnosed via a prostate-specific antigen (PSA) test [4]. Additionally, a percentage ranging from 30-60% of patients with BCR experience an evidence of local disease progression on imaging [5,6]. Despite the existence of guidelines panels, supported by comprehensive literature reviews, there is no consensus on the optimal salvage therapy strategy [7,8]. An available treatment option is represented by partial prostate re-irradiation (PPR), a therapy scheme which combines the use of accurate medical images with stereotactic body radiation therapy (SBRT) to deliver ablative doses to a small region including the tumor volume [9,10].
Heterogeneous disease within and among patients demands a need for an individualized approach to PCa relapse. Precision medicine aims to design treatment plans tailored

Materials and Methods
The radiomics workflow (Figure 1) involves the following steps [37,38]: (i) definition of the clinical endpoint and suitable patient cohort, (ii) identification of appropriate image modalities for feature extraction, (iii) contouring of the region of interest (ROI) segmentation in a manual, semiautomatic or fully automatic manner, (iv) radiomic features extraction from the voxels included in the ROI, (v) training of ML algorithm to obtain a predictive model. The most relevant steps will be highlighted in the following sections.

Patient Data
Forty-three patients with evidence of isolated intra-prostatic recurrence of PCa were treated following a PPR protocol and retrospectively analyzed. All the patients met the following inclusion criteria: (1) previous EBRT as primary treatment, (2) evidence of biochemical recurrence (BCR) defined as a PSA rise of 2 ng/mL above the nadir [39], The radiomics framework depicts all the steps needed to build a radiomics model. After medical image acquisition and reconstruction, the region of interest (ROI) is segmented manually, semi-automatically or fully automatically. The feature extraction step, including post-processing of the acquired image, provides numerous quantitative biomarkers managed by means of artificial intelligence to achieve a clinical decision support system.

Patient Data
Forty-three patients with evidence of isolated intra-prostatic recurrence of PCa were treated following a PPR protocol and retrospectively analyzed. All the patients met the following inclusion criteria: (1) previous EBRT as primary treatment, (2) evidence of biochemical recurrence (BCR) defined as a PSA rise of 2 ng/mL above the nadir [39], (3) subsequent detection of isolated local relapse via (11C)-choline PET-CT, and (4) PPR as local salvage treatment. Furthermore, in order to ensure reproducible and repeatable results, we excluded from the analysis all the patients with (11C)-choline PET-CT not acquired in our institution because of inter-scanner variability or different image acquisition protocol.
As already extensively described [36], PPR treatment was performed delivering a volumetric modulated arc therapy (VMAT) technique with a course scheme of 35 Gy in seven daily fractions of 5 Gy, five fractions/week ( Figure 2); considering an alpha-beta ratio of 1.5 Gy, the total BED at the target was 151.7 Gy [40]. All patients were imaged with a non-contrast-enhanced pre-treatment CT using a 32-slice scanner (Toshiba Aquilion LB, Toshiba Medical Systems Europe, Zoetermeer, the Netherlands). The gross target volume (GTV) was defined with the aid of the (11C)-choline PET as the volume delineated with a semi-automated technique using a fixed threshold of 40% of the maximum signal intensity. To reduce inter-observer variability, minimal manual corrections of the GTV borders were performed to take into account anatomical differences due to possible sub-optimal spatial registration between different image modalities. The dose distributions were calculated on a resolution grid of 2.5 mm or less with Eclipse treatment planning system (TPS) version 13.6, using the Anisotropic Analytical Algorithm version 13.6.23 (Varian Medical System, Palo Alto, CA, USA). For each patient, PET and CT images, together with the dose distributions and GTV contours, were saved in the digital imaging and communications in medicine (DICOM) format. The following analysis, including data conversion, image post processing, feature calculation and model training and validation were performed in Matlab R2021b version: 9.12.0.1927505 R2022a (The MathWorks, Inc., Apple Hill Drive Natick, MA, USA). The radiomics framework depicts all the steps needed to build a radiomics model. After medical image acquisition and reconstruction, the region of interest (ROI) is segmented manually, semi-automatically or fully automatically. The feature extraction step, including post-processing of the acquired image, provides numerous quantitative biomarkers managed by means of artificial intelligence to achieve a clinical decision support system.
As already extensively described [36], PPR treatment was performed delivering a volumetric modulated arc therapy (VMAT) technique with a course scheme of 35 Gy in seven daily fractions of 5 Gy, five fractions/week ( Figure 2); considering an alpha-beta ratio of 1.5 Gy, the total BED at the target was 151.7 Gy [40]. All patients were imaged with a non-contrast-enhanced pre-treatment CT using a 32-slice scanner (Toshiba Aquilion LB, Toshiba Medical Systems Europe, Zoetermeer, the Netherlands). The gross target volume (GTV) was defined with the aid of the (11C)-choline PET as the volume delineated with a semi-automated technique using a fixed threshold of 40% of the maximum signal intensity. To reduce inter-observer variability, minimal manual corrections of the GTV borders were performed to take into account anatomical differences due to possible sub-optimal spatial registration between different image modalities. The dose distributions were calculated on a resolution grid of 2.5 mm or less with Eclipse treatment planning system (TPS) version 13.6, using the Anisotropic Analytical Algorithm version 13.6.23 (Varian Medical System, Palo Alto, CA, USA). For each patient, PET and CT images, together with the dose distributions and GTV contours, were saved in the digital imaging and communications in medicine (DICOM) format. The following analysis, including data conversion, image post processing, feature calculation and model training and validation were performed in Matlab R2021b version: 9.12.0.1927505 R2022a (The MathWorks, Inc., Apple Hill Drive Natick, MA, USA).
Tumor response after PPR was based on PSA level dosage on subsequent follow-up, recorded 3 months after therapy, quarterly for the successive 2 years, biannually until the fifth year and annually thereafter. During follow-up, according to Phoenix Consensum criteria [39], BCR occurred when PSA nadir + 2 ng/mL was observed. In the case of BCR, patients were scanned with an additional (11C)-choline-PET-CT to evaluate for local, regional, or metastatic failure. All recurrences occurred within the treatment field, except for one, which was observed outside the treatment field. Patient characteristics are summarized in Table 1. Tumor response after PPR was based on PSA level dosage on subsequent follow-up, recorded 3 months after therapy, quarterly for the successive 2 years, biannually until the fifth year and annually thereafter. During follow-up, according to Phoenix Consensum criteria [39], BCR occurred when PSA nadir + 2 ng/mL was observed. In the case of BCR, patients were scanned with an additional (11C)-choline-PET-CT to evaluate for local, regional, or metastatic failure. All recurrences occurred within the treatment field, except for one, which was observed outside the treatment field. Patient characteristics are summarized in Table 1. Abbreviations: PSA-prostate-specific antigen; BCR-biochemical recurrence; PPR-partial prostate re-irradiation; LF-local failure.

Image Processing
It is well known how different choices during features calculation procedures can impact the ML analysis [41,42]. To standardize this practice, we designed an image processing scheme, i.e., the sequence of actions needed to obtain image biomarkers from medical images, following the methodology and definitions of the Image Biomarker Standardization Initiative (IBSI) [43].

Image Processing
It is well known how different choices during features calculation procedures can impact the ML analysis [41,42]. To standardize this practice, we designed an image processing scheme, i.e., the sequence of actions needed to obtain image biomarkers from medical images, following the methodology and definitions of the Image Biomarker Standardization Initiative (IBSI) [43].
Firstly, in order to work with more meaningful information, raw data from PET and dose distribution from TPS were corrected voxel-wise in standard uptake value (SUV) and BED (α/β-ratio of 1.5 Gy), respectively [44]. Because GTV contours within DICOM RTSTRUCT files were stored as a set of points defining a polygon for each image slice, an in-house segmentation algorithm was used to find out which image voxel makes up the binary ROI mask. The chosen method is based on a crossing-number test [45]. It counts the number of times a line starting from the point and going in a fixed direction intersects with the polygonal boundary; all the points with odd counts are considered part of the ROI mask.
PET, CT, BED distribution and the corresponding ROI masks were also interpolated, using a trilinear algorithm, to 1 × 1 × 1 mm 3 isotropic voxel size to provide a texture features comparable across the cohort and make measurement reproducible [46]. A threshold of 0.5 was used to binarize ROI mask voxels to account for partial volume effects introduced by the trilinear interpolation. Subsequently, since Hounsfield units (HU) do not take non-integer values, CT data were rounded to the nearest integer, whereas no re-segmentation was applied. ROI masks were then utilized to isolate the voxels within the GTV from the surrounding intensities, that were replaced by NaN values.
Voxel intensities within the ROI were discretized into 64 grey levels in order to normalize their values among patients and reduce noise. A fixed bin number discretization approach was preferred because allows a direct comparison of feature values across samples [47]. A total of 380 radiomic features were calculated for each patient using an in-house Matlab code previously benchmarked with different phantoms to quantify the IBSI compliance level [48].

Machine Learning Model
For ML analysis, outcome was binarized considering patient status at three years from the end of the PPR treatment. Local relapse events after PPR were considered as positive whereas relapse-free patients were considered as negative. Radiomic features dataset from PET, CT and BED was used to build an ML model for treatment response classification. As shown in Figure 3, the ML procedure involves three steps: oversampling of the minority class, feature selection and model building from the selected features.
RTSTRUCT files were stored as a set of points defining a polygon for each image slice, an in-house segmentation algorithm was used to find out which image voxel makes up the binary ROI mask. The chosen method is based on a crossing-number test [45]. It counts the number of times a line starting from the point and going in a fixed direction intersects with the polygonal boundary; all the points with odd counts are considered part of the ROI mask.
PET, CT, BED distribution and the corresponding ROI masks were also interpolated, using a trilinear algorithm, to 1 × 1 × 1 mm 3 isotropic voxel size to provide a texture features comparable across the cohort and make measurement reproducible [46]. A threshold of 0.5 was used to binarize ROI mask voxels to account for partial volume effects introduced by the trilinear interpolation. Subsequently, since Hounsfield units (HU) do not take non-integer values, CT data were rounded to the nearest integer, whereas no re-segmentation was applied. ROI masks were then utilized to isolate the voxels within the GTV from the surrounding intensities, that were replaced by NaN values.
Voxel intensities within the ROI were discretized into 64 grey levels in order to normalize their values among patients and reduce noise. A fixed bin number discretization approach was preferred because allows a direct comparison of feature values across samples [47]. A total of 380 radiomic features were calculated for each patient using an inhouse Matlab code previously benchmarked with different phantoms to quantify the IBSI compliance level [48].

Machine Learning Model
For ML analysis, outcome was binarized considering patient status at three years from the end of the PPR treatment. Local relapse events after PPR were considered as positive whereas relapse-free patients were considered as negative. Radiomic features dataset from PET, CT and BED was used to build an ML model for treatment response classification. As shown in Figure 3, the ML procedure involves three steps: oversampling of the minority class, feature selection and model building from the selected features. The first step was necessary because local failure rate was 30.2% resulting in an imbalanced dataset. In this condition, the models can be prone to prefer prediction of the dominant class to boost performance during the training phase [49]. To balance the distribution of majority and minority classes, an adjustment strategy using the adaptive synthetic (ADASYN) method was adopted on training data set [50].
Then, feature selection was performed by means of Neighborhood component analysis (NCA) [51], a regularized non-parametric algorithm which learns features' weights The first step was necessary because local failure rate was 30.2% resulting in an imbalanced dataset. In this condition, the models can be prone to prefer prediction of the dominant class to boost performance during the training phase [49]. To balance the distribution of majority and minority classes, an adjustment strategy using the adaptive synthetic (ADASYN) method was adopted on training data set [50].
Then, feature selection was performed by means of Neighborhood component analysis (NCA) [51], a regularized non-parametric algorithm which learns features' weights by minimizing an objective function that measures the average leave-one-out classification error. The regularization term, needed to avoid overfitting, was set to 0.02 while the solver type chosen for estimating feature weights was the stochastic gradient descent algorithm.
The final step was outcome modeling by ensemble ML (EML). EML is an approach that seeks better predictive performance by aggregating predictions from multiple weak learners [52]. A well-known system to combine weak learners is boosting, a method which learns them sequentially in an adaptive way: a model at a given step is built giving more importance to observations in the dataset that were previously badly handled. Among ensemble aggregation techniques, RobustBoost was preferred because it is significantly more robust against label noise [53]. A threshold of 5% was used as classification error target, while the maximum number of iterations, often called "learning cycles", was set to 500. For training the ensemble models, decision trees with a maximum number of split set to 4 were used as weak learners [54].
To yield robust generalized performance of the trained models, a 100-times-repeated 10-fold cross-validation (CV) was performed. During each of the CV iterative loops, the training set (9/10 of the whole dataset) was employed to perform the unbalanced data correction. In this phase, to avoid data leakage [55], the NCA feature selection to obtain the highest weighted variables used to train the EML models was also implemented. Lastly, the testing set (1/10 of the whole dataset) was utilized to obtain predictions by means of the trained classifiers ( Figure 3). Furthermore, model performance was evaluated relatively to the increasing number of features. With this goal, the CVs were run increasing the variable number up to a maximum of 8 sorted in the feature selection process.
The model was assessed by estimating sensitivity (true positive rate), specificity (true negative rate), accuracy (fraction of the events correctly classified) and the area under curve (AUC) from the receiver operating characteristic (ROC) analysis [56]. To obtain a global model performance, average values and standard deviations (σ) of these scores were computed over all the repeated CVs. At the end, the best predictive classifier was considered as the one with the highest accuracy.

Results
Model scores obtained which increased the feature number are shown in Figure 4. The model with best performance was built by a radiomic signature based on four variables, showing an AUC of 0.68 (±0.06 σ), with a sensitivity, specificity, and accuracy of 0.61 (±0.11 σ), 0.63 (±0.06 σ), and 0.63 (±0.06 σ) respectively. The ROC curve of the selected EML and the most frequently chosen radiomic features during repeated CV are shown in Figures 5 and 6, respectively. Two predictors, "Variance" and "Energy", were intensitybased statistical features, while "Integrated Intensity" and "Large zone high gray level emphasis" were morphological and textural features, respectively [43]. All the variables were extracted from BED distributions.

Discussion
Radiomics has recently emerged as a tool for image analysis which converts medical images into quantitative descriptors [57]. PCa radiomics has been largely applied, mostly using multiparametric MRI, for automated and non-invasive tumor localization and characterization [58]. Treatment response prediction by radiomic features was also the aim of several studies. For instance, Alongi et al. [28] demonstrated how radiomic features from PET/CT imaging are associated with PCa patients' outcome. Similarly, Abdollahi at al. developed a model based on MRI radiomic features to predict EBRT response, Gleason Score and PCa stage [59]. Since the use of imaging is vital to diagnose and stage recurrence, in particular using multiparametric MRI and PET/CT [60], it can be postulated that radiomics analysis could provide useful information to predict its outcome. In particular, PET/CT is known to provide useful data regarding the metabolic and morphologic phenotype of the recurrent tumor. BED distribution was found to accurately describe the radiotherapy treatment: R. J. Klement et al. correlated the near-minimum and near-maximum dose prescribed to the planning target volume and their average to the risk of local recurrence of non-small cell lung cancer [61].
The inclusion of BED maps delivered during EBRT is an effective way for improving the predictive power of AI-based classifiers. Avanzo et al. [62] highlighted this point by showing how the integration of radiomic and machine-learned predictors from BED maps is advantageous, relative to the use of only CT-derived features, in increasing the predictive performances of both ML and deep learning models. Likewise, Welch et al. [63] demonstrated the effectiveness of the aggregation of features derived from different kinds of data (in this case clinical, radiomic and dosiomics) to predict the loco-regional failure of head and neck cancer.

Discussion
Radiomics has recently emerged as a tool for image analysis which converts medical images into quantitative descriptors [57]. PCa radiomics has been largely applied, mostly using multiparametric MRI, for automated and non-invasive tumor localization and characterization [58]. Treatment response prediction by radiomic features was also the aim of several studies. For instance, Alongi et al. [28] demonstrated how radiomic features from PET/CT imaging are associated with PCa patients' outcome. Similarly, Abdollahi at al. developed a model based on MRI radiomic features to predict EBRT response, Gleason Score and PCa stage [59]. Since the use of imaging is vital to diagnose and stage recurrence, in particular using multiparametric MRI and PET/CT [60], it can be postulated that radiomics analysis could provide useful information to predict its outcome. In particular, PET/CT is known to provide useful data regarding the metabolic and morphologic phenotype of the recurrent tumor. BED distribution was found to accurately describe the radiotherapy treatment: R. J. Klement et al. correlated the near-minimum and near-maximum dose prescribed to the planning target volume and their average to the risk of local recurrence of non-small cell lung cancer [61].
The inclusion of BED maps delivered during EBRT is an effective way for improving the predictive power of AI-based classifiers. Avanzo et al. [62] highlighted this point by showing how the integration of radiomic and machine-learned predictors from BED maps is advantageous, relative to the use of only CT-derived features, in increasing the predictive performances of both ML and deep learning models. Likewise, Welch et al. [63] demonstrated the effectiveness of the aggregation of features derived from different kinds of data (in this case clinical, radiomic and dosiomics) to predict the loco-regional failure of head and neck cancer.
As such, the purpose of this study was to explore the potential of imaging and dose data of recurrent PCa to for predicting treatment response to EBRT, where the endpoint for response was a new clinical recurrence at three years from the end of the therapy. The obtained results, although the model performance on the repeated validation was only fair, demonstrates that radiomics-based ML models can help in identifying recurrent tumors at highest risk of lack of response.
As is well known, overfitting is a typical fault of ML models, meaning that the model precisely follows the data random noise instead of detecting patterns or trends of data [64]. To build the radiomic signature and prevent overfitting, the most meaningful variables were chosen by using NCA feature selection on CV (Figure 3). The radiomic model with best performance showed an accuracy of 0.62 (±0.06 σ) and was derived by four radiomic features which were all selected from the dose distribution. Because of this result, the classifier can be considered as a dosiomic model, suggesting that local failure-free survival is strictly correlated to the BED pattern within the RT target. The four most frequently selected dosiomic variables were: "Variance" and "Energy", two first order statistical features describing the spread and magnitude of BED distribution in the tumor; "Integrated intensity", the average dose intensity in the GTV multiplied by its volume; "Large Zone High Gray Level Emphasis" (LZHGE) that describes the emphasis of high-level BED zones. These results are in agreement with previous findings where the highest BED was proven to increase relapse-free survival after brachytherapy [65]. Dose escalation has been shown to improve the response of PCa. Moreover, PCa is believed to be sensitive to fractionation schedule, also suggesting the superiority of BED over dose for treatment analysis [66].
Our study has several limitations. Although inclusion criteria produced a homogeneous patient cohort, this resulted in a small dataset. In order to overcome this issue, a CV approach was implemented to assess a more robust model performance. Moreover, the result could be affected by selection bias because of the retrospective nature of the study. Another restriction of this work is the relative short follow-up of the patient cohort; this implies that patients with late relapse are categorized wrongly as relapse-free. Therefore, short follow-up can represent a confounding factor for predictive analyses. Finally, the patient population was imbalanced in favor of relapse-free patients. Despite mitigating the effect of imbalance by using ADASYN algorithm, the model would benefit from a more balanced dataset. A promising strategy to further increase the predictive power is to widely investigate the patient dataset. Other clinical factors or features, such as those extracted from genomes can also be found to be associated with isolated local recurrence incidence after PPR, providing a deeper knowledge [67]. Therefore, their integration into the model can improve the prediction capability and robustness.
Despite its limitations, to the best of our knowledge, this study offers a proof-ofconcept of using an ML radiomics-based model to predict treatment response after reirradiation of a radio-recurrent PCa. A particularly attractive process in which our ML approach could find its application is the tumor dose painting (DP) [68,69]. It is well known that the common planning process involves patient preparation, simulation CT-scan, target and organ at risk delineation and dose distribution calculation; all these steps are used to create an individualized radiotherapy plan according to the treatment prescription. Instead of prescribing a standard dose, it could be possible to apply a DP procedure where the prescribed dose is adapted according to the presumed BED spatial dependences within the tumor found by our predictive model. These findings open new scenarios for developing decision support systems able to help experts in defining the better salvage strategy for managing recurrent tumors.

Conclusions
A dosiomic-based classifier was developed to predict therapy response at three years from the end of PPR. Dose-based predictors contain important details related to patient outcome. Future developments such as an inter-scanner calibration and an external validation are needed prior its inclusion into clinical practice.