Relevance of Dynamic 18F-DOPA PET Radiomics for Differentiation of High-Grade Glioma Progression from Treatment-Related Changes

This study evaluates the relevance of 18F-DOPA PET static and dynamic radiomics for differentiation of high-grade glioma (HGG) progression from treatment-related changes (TRC) by comparing diagnostic performances to the current PET imaging standard of care. Eighty-five patients with histologically confirmed HGG and investigated by dynamic 18F-FDOPA PET in two institutions were retrospectively selected. ElasticNet logistic regression, Random Forest and XGBoost machine models were trained with different sets of features—radiomics extracted from static tumor-to-background-ratio (TBR) parametric images, radiomics extracted from time-to-peak (TTP) parametric images, as well as combination of both—in order to discriminate glioma progression from TRC at 6 months from the PET scan. Diagnostic performances of the models were compared to a logistic regression model with TBRmean ± clinical features used as reference. Training was performed on data from the first center, while external validation was performed on data from the second center. Best radiomics models showed only slightly better performances than the reference model (respective AUCs of 0.834 vs. 0.792, p < 0.001). Our current results show similar findings at the multicentric level using different machine learning models and report a marginal additional value for TBR static and TTP dynamic radiomics over the classical analysis based on TBR values.


Introduction
Amino-acid PET radiotracers, such as 3,4-dihydroxy-6-[18F]-fluoro-L-phenylalanine ( 18 F-FDOPA), are particularly useful for diagnosis of glioma recurrences [1][2][3], specifically high-grade gliomas (HGG) [4]. This is one of the underlying reasons why RANO (Response Assessment Neuro-oncology Group) has recommended assessment of gliomas using amino-acid PET radiotracers, in combination with MRI [5]. Indeed, one of the main limitations of conventional MRI is its inability to accurately differentiate glioma progression from treatment-related changes (TRC), given the relatively similar contrast enhancements observed in the two entities [5].
Amino-acid PET imaging in neuro-oncology is currently a fast-growing field, with diagnostic performances enhanced by dynamic [6] and/or radiomic [7] analyses. Radiomics, which involves extracting large amounts of image features, including morphological, statistical and textural features to characterize tumor heterogeneity, have not been widely studied in the context of glioma recurrence. To date, very few studies have investigated whether amino-acid PET-integrating radiomic analyses can differentiate glioma progression from treatment-related changes [8][9][10]. These studies did, however, show that radiomics could yield high diagnostic performances in this field, though none investigated 18 F-FDOPA at a multi-centric level, directly comparing its performance to the current clinical standard of PET imaging (i.e., as opposed to classical tumor-to-background (TBR) parameters used in routine practice). The integration of dynamic PET imaging added considerable predictive value to conventional static parameters in terms of the initial diagnosis of glioma [6,11]. It is nevertheless noteworthy that this predictive value could not be extended to glioma recurrences, at least based on data from the currently available literature [3,12,13]. Indeed, our team recently showed, in a single-center 18 F-DOPA PET study, that performances of dynamic parameters to differentiate glioma progression from treatment-related changes were lower than those of conventional static parameters in a population of mixed low-grade and high-grade gliomas (respective accuracies of 77% and 96% [3]). Recent studies have also reported better diagnostic performances for radiomic features obtained from dynamic parametric images as compared to more conventional dynamic parameters extracted from volumes of interest (VOIs) [12,13]. The same authors reported improved diagnostic performances of dynamic parameters extracted from the voxel level coupled with radiomic analysis compared to dynamic parameters extracted from a VOI, with the latter unable to predict the presence of TERT promoter mutation in gliomas at the initial diagnosis [12,14].
The current study therefore aims to evaluate the relevance of 18 F-DOPA PET static and dynamic radiomic features, sourced from two independent nuclear medicine departments, for differentiation of HGG progression from that of TRC by assessing their diagnostic performances and comparing them to the current standard of PET imaging care.

Patients
To discriminate progression from TRC, we retrospectively identified patients with a histologically confirmed HGG investigated by dynamic 18 F-FDOPA PET between November 2015 and June 2020, from two different institutions (CHRU of Nancy and Pitié-Salpêtrière hospital in Paris, France). All surgical tumor samples or stereotactic biopsies were classified according to the WHO 2016 classification [15]. To reduce the risk of 18 F-FDOPA PET false positives, only patients with a minimum 3-month interval between the end of radiation therapy and the 18 F-FDOPA PET acquisition were included. Final diagnoses were either determined from the histopathology or from the clinical-radiology follow-up during the 6-month follow-up period, based on the RANO working group criteria [5]. All patients included in the study gave their informed consent. The institutional ethics committee (Comité d'Ethique du CHRU de Nancy-FRANCE) approved the evaluation of retrospective patient data on 26 August 2020. The trial was registered at ClinicalTrials.gov (NCT04469244). The study complied with the principles of the Declaration of Helsinki.

PET Data Acquisition and Processing
All patients were asked to fast for at least 4 h prior to the PET scan, and some patients also received Carbidopa 1hr prior to their exam, depending on the procedural protocol in place at the respective centers. Following the injection of 2-3 MBq of 18 F-FDOPA per kg of body weight, a 30-min dynamic PET acquisition was performed. Static PET images were reconstructed from the list mode data using the last 20 min of the acquisition. For dynamic PET images, 30 frames of 1 min each were reconstructed [11].

Image Pre-Processing and Feature Extraction
To correct for the different voxel sizes of reconstructed images, all PET images were resampled into images with 2 × 2 × 2 mm 3 voxels using the SimpleITK Python package [17] with a linear interpolation according to the Image Biomarker Standardization Initiative (IBSI) recommendations. Healthy brain and tumor VOI segmentations were performed by a nuclear physician (L.R.) using LifeX software (lifexsoft.org) [18], as previously described [11]. For healthy brain, a crescent-shaped VOI was positioned manually on three consecutive image slices on the semi-oval center of the unaffected hemisphere to include both white and gray matter [19]. Based on a threshold of 1.6 of mean standardized uptake value of healthy brain (SUV mean ), a semi-automatic segmentation was used to determine tumor VOIs [19].
A correction for patient movements was performed on dynamic images to reduce any potential impact on voxel time-activity curves (TACs) due to long acquisition times. Dynamic images were registered on the CT for PET/CTs and on the MRI T1-enhanced gadolinium images for the PET/MR system. Moreover, working at a voxel level implies a greater influence of noise in TACs. Prior to TAC extraction, dynamic images were therefore denoised using the highly constrained backprojection local reconstruction (HYPR-LR) method, which has shown promising results for PET images [20]. As recommended in [20], PET images were denoised based on separate composites of uptake (frame 1:8), specific retention (frame 8:20), equilibrium (frame 20:30) and a 3D Gaussian with a FWHM of 9 mm.
Static images were normalized to the SUV mean of healthy brain VOIs to neutralize the impact of carbidopa premedication on SUV measurements and to create static TBR parametric images. To avoid amplifying noise from TACs of dynamic images, voxel TAC ratio was obtained by dividing the preliminary fitted voxel tumor TAC by the fitted mean brain TAC. These normalization methods were previously validated elsewhere [21]. Time-to-peak (TTP) values, which represent the time interval between tracer injection and the time point of the maximal TAC value, were extracted from each individual tumor at the voxel level to generate parametric TTP images.

Feature Extraction
From both static TBR and dynamic TTP parametric images, 94 radiomic features, including statistical, histogram-based, local-intensity and textural features were extracted using the tumor VOIs shared between the two types of images. Additionally, 11 common morphological features between the two image types were extracted. An absolute discretization of the images was performed with fixed bin sizes of 0.1 SUV and 1 min, respectively, for the static TBR and dynamic TTP parametric images, when required. To allow bins from different discretized images to be compared, the first bin was always designated as 0. For textural matrices, a 3D merging strategy was used [22], and only neighbors at a distance of 1 voxel were considered, with no distance weighting. To extract radiomic features according to the IBSI [22], the pyradiomics package was used (Available online: https://github.com/Radiomics/pyradiomics, accessed on 2 November 2021), as well as an in-house software for local-intensity features that were not available in pyradiomics [11]. Mathematical justifications of radiomic features have been given in [22]. To remove effects introduced by the use of different PET systems, features were harmonized with the modified ComBat method (Available online: https://github.com/Jfortin1/neuroCombat (accessed on 9 November 2021)) [23,24], with a digital Vereos PET device as a reference. Device effects were computed in a non-parametric manner using the empirical Bayes method to pool information across features. No biological covariates were considered. To investigate the effect of clinical data in combination with 18 F-FDOPA PET parameters in the reference model, several clinical features, including age, sex, histopathological WHO grade, IDH mutation status, 1p/19q codeletion status, previous tumor resection and contrast enhancement on MRI were considered.

Model Building and Evaluation
Days of progression-free survival were dichotomized to a 6-month threshold and used as a reference label for the classification. To improve robustness and evaluate the general application of the learning algorithms, training and test sets were selected from different hospital centers. Patients from the CHRU of Nancy were used as training sets, and Pitié-Salpêtrière hospital patients were considered test sets. In the machine learning models presented below, all transformations and algorithms were fitted using only the training set and were subsequently applied to the test set.
All extracted radiomic features were initially normalized with z-score normalization. Dimensionality reduction was performed using hierarchical clustering based on an absolute spearman correlation coefficient (SCC) as distance matrix and a threshold of 0.9 [11,25]. These two previous steps were only performed on the numerical features before merging with the categorical features, where applicable. Due to class imbalance, the adaptive synthetic (ADASYN) sampling technique [26,27] was applied to oversample the minority class. Different machine learning algorithms were evaluated to identify robust comparisons: (I) ElasticNet logistic regression (LR), (II) random forest (RF) and (III) XGBoost (XGB) [28]. (I) and (II) were implemented in the scikit-learn Python package (Available online: https://scikit-learn.org/stable/index.html accessed on 15 November 2021), and (III) was implemented in the XGBoost Python package (Available online: https://xgboost.readthedocs.io/en/latest/index.html (accessed on 15 November 2021)). Each of the 3 models was trained with different sets of features: (I) radiomic features extracted from static TBR parametric images (94 static TBR radiomic features and 11 morphological features), (II) radiomic features extracted from TTP parametric images (94 TTP radiomic features and 11 morphological features), (III) a combination of (I) and (II) (94 static TBR radiomic features, 94 TTP radiomic features and 11 morphological features). As a previous study from our team demonstrated the high level of accuracy of VOI-based TBR mean for prediction of glioma recurrences [3], three additional models were therefore fitted to serve as references: LR trained with (IV) previously mentioned clinical features, (V) TBR mean and (VI) a combination of TBR mean and clinical features.
The hyperparameters required for the different models were optimized appropriately. The main objective of hyperparameter tuning is to limit model overfitting and therefore also to better generalize on unseen data. The hyperparameters of the different learning algorithms were tuned only on the training set by applying an internal 5-fold cross validation (CV), which was repeated 20 times. The tuning process was driven by a Bayesian search based on optimization of Gaussian processes (Available online: https://github.com/scikit-optimize/scikit-optimize (accessed on 16 November 2021)), as it showed better results than a classical grid search and a random search [29]. The intervals and distributions for sampling sets of hyperparameters are provided in Table 1. The best hyperparameter set was the one yielding the minimal cross-entropy loss over the 300 iterations of the Bayesian search. Using the optimized hyperparameter set, 1000 models were trained on the training set using 1000 bootstrap iterations. For each bootstrap iteration, out-of-bag samples corresponding to the training samples that were not used to train the bootstrapped model were used to get a generalized performance on the training set that could also be considered a model validation. For each bootstrap, the trained models were then individually applied to the test set. Model performance on the test set was assessed based on different metrics to get a reliable mean generalized performance. The whole pipeline is summarized in Figure 1.

Statistical Analysis
Categorical variables are expressed as percentages, and continuous variables are expressed as means (range). Spearman correlation coefficients were used to compute correlations between TBR mean and radiomic features from either TBR static or TTP dynamic parametric images. Diagnostic performances were determined from bootstrapped training samples, out-of-bag samples and testing samples using accuracy, area under the curve (AUC), precision, F1 score and balanced accuracy. On each set, the 95% confidence intervals (CI) of individual metrics were derived from the distribution of performances obtained with the individual 1000 bootstrapped, trained models. Unilateral comparisons of superiority were performed using Wilcoxon tests between the 1000 available AUCs, obtained from the predictions of the 1000 bootstrapped models on the test set for different models. Corrections for multiple comparisons [30] were applied. A p-value < 0.05 was considered significant. To evaluate the importance of features in each model, the static/dynamic dataset was assessed using Shapley additive explanations (SHAP) [31] on the test set. All analyses were conducted in Python (version 3.8.5; Available online: https://www.python.org/ (accessed on 2 November 2021)).

Patient Characteristics
Ninety patients were initially retrospectively selected. Five patients were ultimately excluded to avoid mis-training of the models, while three patients had incomplete clinical information and dynamic images and data from two additional patients remained too noisy for voxel-based extraction of TTP, even after denoising. The final population therefore included 85 patients (average of 57 [21,80] years old, 46% women) with dynamic 18 F-FDOPA PET acquisitions that could be considered for classification of a progression at 6 months from the PET scan. Seventy patients underwent a PET/CT exam, and 15 patients had a PET/MRI acquisition. The dataset was collected from two different centers. Data for 55 patients was obtained from the CHRU of Nancy, and the remaining 30 from the Pitié-Salpêtrière hospital (61 progressions at 6 months, 37 in data from Nancy and 24 in data from Paris). Fifty (59%) patients were premedicated with carbidopa. Tumor histopathology at initial diagnosis was either performed on tissue obtained during surgery (55 patients, 65%) or biopsy tissue (30 patients, 35%). According to the WHO 2016 classification of gliomas, eight (9%) patients were classified as having IDH-mutant anaplastic astrocytomas, 12 (14%) as having IDH-wildtype anaplastic astrocytomas, 10 (12%) as having IDH-mutant and 1p/19q anaplastic oligodendrogliomas, 6 (7%) as having IDH-mutant glioblastomas and 49 (58%) patients as having IDH-wildtype glioblastomas. Figure 2 details the correlation coefficients of the TBR mean and either: (a) morphological features, (b) radiomic features from static TBR parametric images or (c) radiomic features from dynamic TTP parametric images. Lower correlation coefficients were obtained for the reference TBR mean feature and morphological features. In static TBR parametric images, some families of features (statistical, NGTDM, GLSZM and NGLDM) exhibited low correlation coefficients with TBR mean . A large number of features extracted from dynamic TTP parametric images were weakly correlated with TBR mean . All these findings suggest the potential added value of these parameters in the reference TBR model.

Classification of Progression at the 6-Month Follow-Up
The reference model for imaging features using LR trained with TBR mean yielded a mean AUC value of 0.792 with a CI of 95% [0.792, 0.792] on the test set. Since the model was not complex and only involved one feature, the AUC value in the bootstrap analysis was the same (i.e., 0.792), which gave rise to a restricted 95% CI.  Heatmaps of correlation coefficients between TBR mean and: (a) morphological features, (b) radiomic features from static TBR parametric images, (c) radiomic features from dynamic TTP parametric images. The features with light color show lower correlation coefficients with the TBR mean feature, as this is the case for morphological features, a limited number of radiomic features from static TBR parametric images (statistical, NGTDM, GLSZM and NGLDM families) and a large number of features extracted from dynamic TTP parametric images. This information suggests the potential added value of these parameters in the reference TBR model. * p-value significant for the comparison with the reference TBR mean model; ¥ p-value significant when compared to the static dataset using the same machine learning model; § p-value significant when compared to the dynamic dataset using the same machine learning model; ‡ p-value significant when compared to the static + dynamic dataset using the same machine learning model.
Introspection of the different models using SHAP values was provided for the static/ dynamic datasets, as they include radiomic features of static TBR and dynamic TTP parametric images (Figure 3). For RF and XGB models, the TTP dynamic radiomic features gave values of the highest importance, with the 10th percentile from the statistics family and large-zone low-grey-level emphasis from grey-level size-zone matrix (GLSZM) being the two most influential contributive features. Although for the LR model, TBR static radiomic features were the most important, TTP dynamic radiomic features accounted for a large part of the model's prediction capabilities. It appears that for all models, morphological and statistical features from both TBR static and TTP dynamic images, as well as texture matrices like GLSZM, neighborhood grey-tone difference (NGTDM) and grey-level run length (GLRLM) from TTP dynamic images, contributed more to the model.  All the extracted data are provided in a Supplementary Table S1.

Discussion
The current study highlights that radiomic features extracted from static TBR and dynamic TTP parametric images only provide slightly better performances in discrimination of HGG progression from TRC, compared to a simple model that only considers static TBR mean parameters, as is currently performed in routine practice. This result was obtained by applying the robust radiomics method analysis in parallel with the current standards, using two independent training and testing patient datasets. Moreover, three different machine learning models were tested and led to the same results, thus strengthening the current findings.
Diagnostic performances for differentiation of HGG progression from TRC (AUC of 0.79 for the current reference model) are lower than those obtained in our previous work (AUC of 0.98 for the TBR mean [3]), albeit within the range of values obtained in studies of large numbers of patients (AUC of 0.78 in a series of 110 patients with 18 F-FDOPA PET [1] and of 0.75 in a series of 127 patients with 18 F-FET PET for TBR mean [10]). These lower performances obtained in our current work, when compared to our previous singlecenter study [3], may be related to a larger population size (85 patients vs. 51) and to the multi-centric nature of the present analysis. Interestingly, correlation analyses performed in the present study show that radiomic features from the morphological family, from several members of the family of static TBR parametric images and, to a more significant extent, radiomic features from TTP dynamic parametric images could provide significant additional value to the routinely used TBR mean parameter, as confirmed by the low correlation coefficients between these radiomic features and the TBR mean parameter ( Figure 2). This justifies performing the present study to evaluate the added value of such radiomic features for differential diagnosis of HGG progression and TRC.
We previously reported that dynamic features extracted from a tumor VOI and radiomic features from static TBR parametric images were of added value for the prediction of molecular parameters at initial diagnosis of gliomas [11]. Our current results do not really replicate these findings for the prediction of recurrence in HGG. Results from our machine learning models only marginally outperform those of routine PET imaging based on the TBR mean model. The latter has been defined as our reference since the addition of clinical features to this model did not show any significant diagnostic performance improvements (AUC of 0.79 for the combination of TBR mean and clinical features, Table 2). In the context of differential diagnosis of glioma progression and TRC, dynamic parameters of amino-acid PET radiotracers, exclusively extracted from tumor VOIs, did not improve on the diagnostic performances of static parameters reported in the literature [3,[32][33][34][35][36].
To date, few studies have investigated the value of amino-acid PET radiomic features in the context of glioma progression [8][9][10]. In a series of 34 glioblastoma patients, Lohmann et al. found that after increasing the number of 18 F-FET PET scans to 102 by data augmentation, the reference TBR mean model after ROC analysis gave an AUC of 0.73, similar to the AUC of 0.74 obtained with their machine learning model for diagnosis of glioma progression [10]. In a series of 160 gliomas, Wang et al. identified that a logistic regression model of static 11 C-methonine PET radiomic features resulted in an AUC of 0.75 to differentiate glioma progression from TRC. These performances were increased to an AUC of 0.91 when 11 C-methonine PET radiomic features were combined with those of 18 F-FDG PET and contrast-enhanced MRI images [8]. However, the Wang et al. study did not include a comparison with a reference standard PET imaging model based only on SUV or TBR parameters. Carles et al. showed significant discrimination of progression-free survival in a series of 32 recurrent glioblastomas before repeat irradiation using Kaplan-Meier curves, but no C-index performances were reported, nor were comparisons to standard PET imaging TBR values included [9]. In contrast to the Wang et al. and Carles et al. studies, Lohmann et al. integrated dynamic parameters extracted from a VOI into their analyses [10]. To the best of our knowledge, our current study is therefore the first to include dynamic TTP parametric images to extract dynamic radiomic features to identify glioma recurrences. Interestingly, machine learning models integrating radiomic dynamic datasets systematically correlated with better performances than those only involving radiomic static datasets (Table 2). This is also confirmed by the greater importance attributed to radiomic features extracted from the dynamic TTP parametric image models trained with static/dynamic datasets (Figure 3). In addition, no other study has, to date, attempted to directly compare results obtained from radiomics machine learning models to those of conventional TBR static parameters also obtained from a machine learning process.
Radiomics extraction is a challenging and complex process that requires important steps in order to obtain accurate results. A meticulous methodological approach was performed to extract radiomic features according to the IBSI guidelines [22]. In addition, building machine learning models integrate crucial steps of feature normalization, dimension reduction [11,25] and corrections for oversampling [26,27]. Three different machine learning models were applied, i.e., LR, RF and XGB models, which all yielded similar results, thereby strengthening the fact that radiomic features only provide marginal additional value over a simple model involving only TBR mean static parameters ( Table 2). The SHAP values provided in Figure 3 confirm our results. Although these three machine learning models are based on different algorithms, very similar features or families of features are selected among the different models to build the optimized models (porphology family, statistics family from TBR static and TTP dynamic images, features extracted from textural matrices like GLSZM, NGTDM, GLRLM from TTP dynamic images for the three models). Importantly, and in contrast to the Lohman, Wang and Carles studies [8][9][10], our current study used radiomics on amino-acid PET imaging to identify glioma recurrences by training models on patient data from the center in Nancy, with external validation performed on different patient data sourced from Paris, which is an important criterion of robustness [37,38]. Moreover, it has been previously mentioned that this is a crucial aspect of reporting results for radiomic analyses, even if it leads to modest results, as is the case in the present study [39], i.e., only limited additional value of radiomic features over the conventional TBR parameter.
Our study suffers from several limitations. First, our population of HGGs included grade 3, as well as grade 4, gliomas, which may have opposed progression profiles, as would, for example, be expected for an anaplastic oligodendroglioma and an IDH-wildtype glioblastoma. Moreover, although we corrected for data harmonization with the modified Combat method, our study derived radiomic features from four different PET scanners using locally optimized acquisition and reconstruction parameters. Finally, our study did not identify any progression-free survival or overall survival benefits since radiomic features did not show significant added value over our conventional TBR parameter for our primary endpoint (progression at 6 months).

Conclusions
Radiomic features from static TBR and dynamic TTP parametric images only provide marginal additional value over a classical analysis based on TBR values for differentiation of HGG progression from TRC. These results are based on a robust machine learning analysis and may be of interest to nuclear physicians to limit the need to develop time-consuming routine radiomic PET imaging processes for this indication.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/biomedicines9121924/s1, Table S1: Excel file includes all of the extracted data.