Prediction of Overall Survival in Cervical Cancer Patients Using PET/CT Radiomic Features

: Background: Radiomics is a ﬁeld of research medicine and data science in which quantitative imaging features are extracted from medical images and successively analyzed to develop models for providing diagnostic, prognostic, and predictive information. The purpose of this work was to develop a machine learning model to predict the survival probability of 85 cervical cancer patients using PET and CT radiomic features as predictors. Methods: Initially, the patients were divided into two mutually exclusive sets: a training set containing 80% of the data and a testing set containing the remaining 20%. The entire analysis was separately conducted for CT and PET features. Genetic algorithms and LASSO regression were used to perform feature selection on the initial PET and CT feature sets. Two different survival models were employed: the Cox proportional hazard model and random survival forest. The Cox model was built using the subset of features obtained with the feature selection process, while all the available features were used for the random survival forest model. The models were trained on the training set; cross-validation was used to ﬁne-tune the models and to obtain a preliminary measurement of the performance. The models were then validated on the test set, using the concordance index as the metric. In addition, alternative versions of the models were developed using tumor recurrence as an adjunct feature to evaluate its impact on predictive performance. Finally, the selected CT and PET features were combined to build a further Cox model. Results: The genetic algorithm was superior to the LASSO regression for feature selection. The best performing model was the Cox model, which was built using the selected CT features; it achieved a concordance index score of 0.707. With the addition of tumor recurrence as a predictive feature, the Cox CT model reached a concordance index score of 0.776. PET features, however, proved to be inadequate for survival prediction. The CT model performed better than the model with combined PET and CT features. Conclusions: The results showed that radiomic features can be used to successfully predict survival probability in cervical cancer patients. In particular, CT radiomic features proved to be better predictors than PET radiomic features in this speciﬁc case.


Introduction
Radiomics refers to the analysis of quantitative features extracted from medical images, including computerized tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI).Radiomic features can be used to build models that provide valuable diagnostic, prognostic, and predictive information [1,2].In the last few years, radiomic features were shown to be promising predictors in several cancer-related studies [3], providing quantitative measures of tumor characteristics that would otherwise be inaccessible.Kan et al. [4] showed that MRI radiomic features can be used to predict lymph node metastasis in early stage cervical cancer.A study by Mu et al. [5] on cervical cancer patients revealed that it is possible to build machine-learning models using PET radiomic features to classify early stage and advanced-stage tumors.Lucia et al. [6] showed that some of the radiomic features extracted from PET/CT and MRI were significant predictors of recurrence in patients with cervical cancer, with higher prognostic power than usual clinical variables.Other studies have also investigated the correlation between PET/CT radiomic features and cervical cancer histological types [7,8].
Cervical cancer (CC) is the fourth-most frequently diagnosed cancer and the fourthmost common cause of death from cancer in women, with an estimated 604,000 new cases and 342,000 deaths worldwide in 2020 [9].
Concurrent chemoradiation is the standard treatment option in locally advanced CC (LACC), allowing a local control of the disease in 70-80% of patients, with 66% and 58% reaching 5-year overall survival (OS) and disease-free survival (DFS) rates, respectively [10,11]. 18F-FDG PET/CT has a well-established role in the management of patients with cervical cancer, especially in staging, treatment response, and recurrence [12,13].However, the prognostic role of conducting quantitative image analysis on these images to determine the OS of LACC patients is still under investigation and requires additional studies [14].
The aim of this work was to use radiomic features extracted from PET/CT images to predict the risk of death in 85 cervical cancer patients.The patients were retrospectively selected for the study.The patients were followed for a time span ranging from a minimum of 6 to a maximum of 105 months.We developed and compared different feature selection methods to select the most informative variables for training a predictive model.Feature selection is a crucial step in radiomic analyses to address the common imbalances between the number of patients and the features in medical applications, known as ill-posed problems.We used the survival and associated survival times as labels for our survival models.The recurrence score is a clinical feature, i.e., a feature not quantified by radiomic analysis but included as an extra feature for our models and estimated at the time of collection of the PET/CT images.Two types of models were employed for prediction: the Cox proportional hazards model [15] and the random survival forest [16].We selected these two types of models because they are standard tools used in survival analysis and allowed us to compare the behavior of linear and ensemble models, preserving easy explainability of the results.We aimed the models to predict the risk associated with the patients: a higher risk corresponded to a lower predicted survival time.We also developed alternative versions of the final models using tumor recurrence as an adjunct feature to evaluate its impact on predictive performance.

Patient and Treatment
All LACC patients underwent pelvic 3D conformal external beam radiotherapy with a prescribed dose to the pelvis of 46 Gy in 23 fractions and a brachytherapy boost, with the high dose-rate or pulsed dose-rate technique.The treatment volumes were delineated as follows: the clinical target volume (CTV) was defined as the gross tumor volume, uterus, parametria, upper third of the vagina, obturator, and presacral, iliac (common, internal, and external), and paraaortic lymph nodes (if PET-positive) with an expansion of 7 mm.CTV boost was defined as lymphadenopathies with high FDG uptake plus an expansion of 7 mm.The planning target volumes (PTV and PTVboost) were obtained by adding a 1 cm isotropic expansion from the CTV.Examples of semiautomated delineation of PTVboost contours based on the baseline PET/CT images are reported in Figure 1.Chemotherapy with cisplatin (40 mg/m 2 ) was administered once per week during the period of external beam radiotherapy.
internal, and external), and paraaortic lymph nodes (if PET-positive) with an expansion of 7 mm.CTV boost was defined as lymphadenopathies with high FDG uptake plus an expansion of 7 mm.The planning target volumes (PTV and PTVboost) were obtained by adding a 1 cm isotropic expansion from the CTV.Examples of semiautomated delineation of PTVboost contours based on the baseline PET/CT images are reported in Figure 1.Chemotherapy with cisplatin (40 mg/m 2 ) was administered once per week during the period of external beam radiotherapy.

Data Acquisition
We collected pretherapy whole-body 18 F-FDG PET/CT scans from 85 patients in this retrospective study at the IRCCS Azienda Ospedaliero-Universitaria di Bologna.The study was conducted according to the guidelines of the Declaration of Helsinki, all patients signed an informed consent, and the local ethical committee of Sant'Orsola-Malpighi Hospital of Bologna approved this study (CE 322/2019/Oss/AUOBo).A total of 3

Data Acquisition
We collected pretherapy whole-body 18 F-FDG PET/CT scans from 85 patients in this retrospective study at the IRCCS Azienda Ospedaliero-Universitaria di Bologna.The study was conducted according to the guidelines of the Declaration of Helsinki, all patients signed an informed consent, and the local ethical committee of Sant'Orsola-Malpighi Hospital of Bologna approved this study (CE 322/2019/Oss/AUOBo).A total of 3 MBq/kg of 18 F-FDG was intravenously injected.The uptake time was 60 min in all of the patients.Images were acquired on a 3D tomograph (Discovery STE; General Electric) for 2 min per bed position.A low-dose CT scan (120 kV, 80 mA) was performed both for attenuation correction and to provide an anatomical map.PET images were reconstructed using an iterative 3D ordered subset expectation maximization method with two iterations and 20 subsets, followed by smoothing (with a 6 mm 3D Gaussian kernel) with CT-based attenuation, scatter, and random coincidence event correction.
The PET edge algorithm implemented in MIM Software (MIM Software Inc., Cleveland, OH, USA) was used for automatic lesion segmentation on PET images.The obtained automatic contours were manually validated by expert nuclear medicine physicians and adjusted to include the entire tumor areas, and rigidly transferred to the coregistered CT images.
The images and the structure sets in DICOM format were imported into 3D Slicer software [17] for the radiomic features extraction from both PET and CT scans; 3D Slicer software is based on the Pyradiomics Python library [18] and allows for the extraction of shape-based, first-order, and texture-based features.

Database Description and Pre-Processing
For each patient, a set of 105 CT and PET radiomic features was collected.Each patient was labelled with two outcomes: survival and survival time.For each patient, we also collected clinical information on recurrence to use as an extra feature in our survival models.In our analyses, we used survival and survival time as required outcomes for the survival models, while recurrence was used as clinical feature along with the radiomic features.Among the 85 patients, 24 died during the study period, and the others were censored.The dataset was split into two disjoint subsets: a training set (80%) and a test set (20%).We standardized the set of features according to the mean and standard deviation of the training set.

Feature Selection
Two different techniques were used and compared to select the best subset of radiomic features: LASSO regression [19] and genetic algorithm [20].We chose the LASSO regression as it is a standard method for the feature selection procedure.The LASSO regression selects the features, minimizing the L1-loss function.Contrarily, the genetic algorithm allows a customization of the loss function, which can be tailored for the specific application.
We used a Cox proportional hazards model with a LASSO penalty, also known as an l 1 penalty, to fit the training data.The number of features to preserve was parametrized by the penalization coefficient of the LASSO regression: a higher penalty leads to fewer features being preserved.We used 100 penalization values evenly spaced in [10 −4 , 1].For each value, a Cox model was trained and evaluated using 5-fold cross-validation.We repeated the cross-validation procedure 5 times for each Cox model to give a more reliable estimation of the model performance.The parameter yielding the best cross-validation result was selected as the optimal one.Then, the best Cox model found was used for the estimation of β parameters: the LASSO model filters out all of the features associated to a corresponding zero value of β.
We developed a genetic algorithm (GA) for the feature selection procedure.Each genome identified a putative set of features, expressed as a binary pattern of the considered features.Features associated to no-null genes were preserved; null genes identified features that were excluded.Each genome was assessed using a Cox model.The resulting model was evaluated using a metric function, and the scores of the models were used as fitness values.The first model evaluation was performed using the Akaike information criterion (AIC) [21] as a metric (Figure 2).The AIC score penalizes models with a high number of features.We estimated the best model, and the corresponding subset of features, as the one with the lowest AIC score.Using the subset of features obtained by the first filtering, a second reduction was performed according to the maximization of the fitness function given by: where µ(CV 5 ) and σ(CV 5 ) are the average and standard deviation scores obtained by the model in a 5-fold cross-validation repeated five times, respectively.We used the concordance index (CI) metric for the evaluation of the cross-validation.The concordance index [22] is defined as: first filtering, a second reduction was performed according to the maximization of the fitness function given by: where µ(CV5) and σ(CV5) are the average and standard deviation scores obtained by the model in a 5-fold cross-validation repeated five times, respectively.We used the concordance index (CI) metric for the evaluation of the cross-validation.The concordance index [22] is defined as: Two patients were correctly ordered if the patient with the higher predicted risk experienced the death event (binary score) before the patient with the lower predicted risk.The concordant pairs were the pairs of patients correctly ordered by the model, while the discordant pairs were those that were incorrectly ordered.This fitness function aimed to maximize the overall score obtained in the repeated cross-validation, while achieving a similar score for all the cross-validation folds, by minimizing the standard deviation.
We performed a 5-fold cross-validation repeated five times for the evaluation of the Cox model, i.e., the identification of the best features subset, using the concordance index as an evaluation metric.
CT and PET features were independently selected and used to build independent models to evaluate their different predictive power.The two selected feature subsets were merged, further reduced by maximizing (1) with the GA and used to build a comprehensive Cox model to estimate the predictive capability of their combination.All the selected Two patients were correctly ordered if the patient with the higher predicted risk experienced the death event (binary score) before the patient with the lower predicted risk.The concordant pairs were the pairs of patients correctly ordered by the model, while the discordant pairs were those that were incorrectly ordered.This fitness function aimed to maximize the overall score obtained in the repeated cross-validation, while achieving a similar score for all the cross-validation folds, by minimizing the standard deviation.
We performed a 5-fold cross-validation repeated five times for the evaluation of the Cox model, i.e., the identification of the best features subset, using the concordance index as an evaluation metric.
CT and PET features were independently selected and used to build independent models to evaluate their different predictive power.The two selected feature subsets were merged, further reduced by maximizing (1) with the GA and used to build a comprehensive Cox model to estimate the predictive capability of their combination.All the selected features were included in the final models, except for the ones with a p-value greater than 0.2, which were manually excluded.

Survival Prediction
We applied penalized Cox and random survival forest (RSF) models for the prediction of patients' overall survival.The Cox and RSF models were applied on the CT and PET feature sets selected by the GA separately and on their union.The RSF is an ensemble model and, therefore, it should not require a prior feature selection step.
The main parameter required to tune the Cox model is the regression penalty.One hundred Cox models were trained with different l 2 penalization values ranging from 10 −3 to 1.The models were evaluated using a 5-fold cross-validation repeated five times.The parameters yielding the best cross-validation results were chosen as the optimal ones.
A grid-search strategy was employed to find the optimal parameters for the RSF model.The grid-search algorithm tests different parameter configurations and selects the one that maximizes the repeated 5-fold cross-validation.We used the concordance index as a metric for cross-validation evaluation.Each configuration considered the number of estimators in the forest, the minimum number of samples required to split a node, and the maximum number of features to consider when looking for the best split of the node.
The predictive capabilities of the best Cox models and RSF models were validated on the test set that was set aside at the beginning of the study, using the concordance index as the metric.

Results
The obtained cross-validation results with the different subsets of selected features are summarized in Tables 1 and 2. The scores obtained with the entire set of features were used as a reference for the results comparison (ref.last row in Tables 1 and 2).The cross-validation results show that the most promising subsets of features were the ones selected by the GA for both PET and CT features.For CT features, the GA selected seven textural features and two first-order features.For PET features, instead, the GA selected three shape-based features, two textural features, and four first-order features.The PET and CT features selected by the GA, with the corresponding Cox coefficients and statistics, are shown in Tables 3 and 4, respectively.The combined set of selected features is shown in Table 5.The results obtained by the best models on the test data are shown in Table 6.The addition of recurrence as an extra feature produced a general improvement in all model performances.The results obtained on the test data using recurrence as an extra feature are shown in Table 7.
Table 6.Results obtained by the models trained with only radiomic features, i.e., without the recurrence feature, evaluated on the test sets.The metric used for evaluation was the concordance index.

Discussion
The developed Cox CT model applied on CT features obtained a CI of 0.707.With recurrence as an additional input variable, it obtained a CI of 0.776, outperforming all the other proposed models.The RSFs performed poorly (CI ≈ 0.5), and the addition of recurrence did not produce a significant improvement.The RSFs' results proved the importance of the feature selection task in radiomic applications.
The developed Cox PET model without a recurrence feature obtained a nonsignificant (<0.5)CI score.The low efficiency of the Cox PET model was probably due to the limited PET scanner resolution.It was shown [23] that for small tumors (volume 10 cm 3 ), textural PET radiomic features are highly correlated with the tumor volume, adding almost no further information, making the quantification of intratumoral heterogeneity inadequate [3].The inclusion of recurrence to the Cox PET model produced a considerable improvement (CI = 0.62, i.e., +30%) in performance.The Cox models, contrary to the RSF models, were built using smaller subsets of features, so the addition of a strong predictor had a greater impact on the performance.Nevertheless, the performance of the Cox PET model was not as good as that of the Cox CT model.The CT model also performed better than the model with combined PET and CT features.
The inclusion of recurrence along with standard radiomic features produced a significant improvement in predictive performance.The radiomic features quantified the patient state at a particular time point, thus losing the information about patient history.The recurrence score allowed for the inclusion of this history information in survival evaluation.The recurrence clinical feature could not be directly evaluated by radiomic analysis; therefore, it may provide a significant boost to the survival models' predictions.

Conclusions
The obtained results showed that the subset of the radiomic features analyzed in this study can be successfully used to predict overall survival in cervical cancer patients.In particular, the Cox model trained with the CT radiomic features selected by the GA achieved a significant result on test data, with a CI score of 0.707.Moreover, the addition of recurrence as an extra feature further improved the predictive power of the model (+7%).The difference between the performance of the CT and PET Cox models confirmed the low efficiency of PET features in catching small tumor structure details due to the low resolution of the data.This was also confirmed by the type of features selected by the GA for the two feature sets.For CT, seven of the nine selected features were textural features, while only two textural features were selected for PET.
The scores achieved by the RSF models highlight the importance of the feature selection task.These ensemble models, trained using all the available features, performed substantially worse than the Cox CT model.
The proposed GA was significantly better in feature selection than the widely used LASSO regression.The integration of GA with the Cox proportional hazard model allowed the selection of meaningful predictive features, addressing the problem of dimensionality reduction required by radiomic applications.Furthermore, the GA feature selection technique developed in this study can be adapted to different types of models and different metrics with minimum effort.
Further improvements can be achieved, including a feature importance criterion in the proposed pipeline, repeating the feature extraction procedure several times.The developed pipeline may also be employed to predict the probability of recurrence outcome, after the definition of an inclusion criterion for the subjects that experienced recurrence more than once.Finally, the model may be further improved by including other clinical variables and investigating their relationships with the radiomic features already found.
The results obtained by the proposed fully automated pipeline proved its suitability for usage in clinical practice, proving the effectiveness of radiomic analysis in CT/PET imaging.However, the results presented in this work must be validated by a multicenter study with a larger cohort of patients.

Figure 1 .
Figure 1.Fused PET/CT axial plane scans of two patients selected in the study.We highlight the tumor area (e.g., PTVboost) identified by expert radiologists with the blue contours.

Figure 1 .
Figure 1.Fused PET/CT axial plane scans of two patients selected in the study.We highlight the tumor area (e.g., PTVboost) identified by expert radiologists with the blue contours.

Figure 2 .
Figure 2. Trend of genetic algorithm fitness values for 200 iterations.The fitness of each genome is the partial AIC of a Cox model fitted with the features selected by that genome.The image shows the fitness minimization of the PET feature set.The minimum and average fitness values are shown in red and green, respectively.

Figure 2 .
Figure 2. Trend of genetic algorithm fitness values for 200 iterations.The fitness of each genome is the partial AIC of a Cox model fitted with the features selected by that genome.The image shows the fitness minimization of the PET feature set.The minimum and average fitness values are shown in red and green, respectively.

Table 1 .
Repeated 5-fold cross-validation score comparison for Cox models built on different subsets of the original CT features set.We identify as baseline the performance obtained with the entire set of available CT radiomic features.

Table 2 .
Repeated 5-fold cross-validation score comparison for Cox models built on different subsets of the original PET features set.We identify as baseline the performance obtained with the entire set of available PET radiomic features.

Table 3 .
Selected PET features and corresponding Cox coefficients, standard errors, 95% confidence intervals, and p-values.

Table 4 .
Selected CT features and corresponding Cox coefficients, standard errors, 95% confidence intervals, and p-values.

Table 5 .
Combined selected CT and PET features after further selection with the GA, with corresponding Cox coefficients, standard errors, 95% confidence intervals, and p-values.

Table 7 .
Results obtained by the models trained with radiomic + recurrence features, evaluated on the test sets.The metric used for evaluation was the concordance index.