1. Introduction
Hodgkin lymphoma (HL) is one of the most common malignancies involving the lymphatic system. The 5-year survival rate for all people with Hodgkin lymphoma is high, with an overall rate of around 87% [
1].
Given the bimodal peak incidence with high rates of presentation at a young age, an individualized, risk-adapted therapy is desirable to maintain high cure rates while minimizing treatment-related toxicity [
2,
3].
Correct identification of predictive biomarkers that correlate with poor therapy response and an overall poor prognosis is essential for a personalized therapy approach, which is crucial to select patients that would benefit from an initial more aggressive therapy while avoiding overtreatment in patients with a high likelihood of a good prognosis [
4,
5,
6,
7].
Molecular imaging with clinical standard positron emission tomography (PET)/computed tomography (CT) using the radiopharmaceutical
18F-fluoro-deoxy-glucose (FDG) is the main imaging procedure for baseline staging of lymphoma, interim response assessment, and evaluation of residual disease in many jurisdictions worldwide [
8].
Standardized uptake value (SUV) obtained from FDG PET/CT scans is the most widely used parameter for lesion depiction and characterization, and it provides a reliable assessment of tumor activity, tumor aggressiveness, and response to treatment [
9].
However, SUV is not reflective of the underlying spatial distribution of tracer activity within a tumor itself, which can be particularly heterogeneous in lymphoma [
10]. The unequal distribution of tracer activity within a tumor on FDG PET/CT is a manifestation of this ‘intra-tumor heterogeneity’, which can be measured by analyzing the variation in the spatial arrangements of voxel intensities [
11].
In recent years, there has been increasing interest in radiomics, the science of extracting and analyzing quantitative and mineable features from standard-of-care biomedical images to create texture analysis of cross-sectional images (CT, MRI, and PET), which may provide detailed information of the underlying pathophysiology. Radiomics features of a tumor may provide additional information regarding tumoral biology and behavior [
12,
13,
14,
15,
16].
Numerous studies have investigated intra-tumor heterogeneity on PET/CT in patients with brain, head, neck, thyroid, lung, breast, esophagus, pancreas, colon, and cervix neoplasms, as well as in patients with sarcomas and lymphomas [
17,
18,
19,
20,
21].
Current clinical lymphoma biomarkers incorporate cellular and molecular data to classify specific disease subtypes and predict clinical behavior [
22].
The association between intra-tumor image-based heterogeneity and biological heterogeneity has been shown to correlate with clinical outcomes such as treatment response and survival in a variety of tumor types, including lymphoma. This suggests that radiomic biomarkers can be developed and cross-referenced with established clinical cellular and molecular biomarkers to better predict outcomes and influence evidence-based clinical decision-making in patients with lymphoma [
22,
23,
24,
25,
26,
27,
28,
29,
30].
This study aims to evaluate the prognostic value of joint PET and CT radiomics combined with standard clinical parameters in patients with HL. We hypothesize that some radiomic features within the baseline PET/CT may predict survival outcomes.
2. Materials and Methods
2.1. Study Cohort
In this institutional review board-approved retrospective study, 88 patients diagnosed and treated in a tertiary referral center with HL from September 2012 to June 2016 were evaluated. Given the retrospective nature of the analysis, consent was waived.
All patients had complete clinical records, including pathology reports from either nodal or extra-nodal biopsies, descriptions of sites of involvement, presence of bulky disease, Ann Arbor Stage, and B symptoms. Furthermore, all standard of care bloodwork, systemic treatment planned and received, as well as the provision of radiotherapy treatment along with response assessment for each line of therapy, were recorded. Follow-up times and progression-free survival were also registered.
Bulky disease was defined as more than 10 cm in any diameter.
Complete metabolic response (CMR) was defined as a disease at the end of therapy PET/CT below the Deauville score criteria of 4 [
31,
32,
33,
34,
35,
36,
37].
2.2. Imaging Acquisition
18F FDG PET/CT was performed in these patients as a component of baseline staging. Images were obtained according to our institutional protocol, as follows: [
23]. PET was performed on a Siemens mCT40 PET/CT scanner (Siemens Healthcare). Patients were positioned supine with their arms outside the region of interest. Images were obtained from the top of the skull to the upper thighs. Iodinated oral contrast material was administered for bowel opacification; no intravenous iodinated contrast material was used. Patients were asked to avoid exercise for 24 h and fast for 6 h before the examination. Patients received an IV injection of 5 MBq/kg (a range of 250–550 MBq) of FDG.
Overall, 5–9 bed positions were obtained, depending on patient height, with an acquisition time of 2–3 min per bed position. CT parameters were: 120 kV; 3.0 mm slice width; 2.0 mm collimation; 0.8 sec rotation time; 8.4 mm feed/rotation. A PET emission scan using time of flight with scatter correction was obtained, covering the identical transverse field of view. The PET parameters were as follows: image size: 2.6 pixels; slice: 3.27; and a 5-mm full width at half maximum (FWHM) Gaussian filter type.
2.3. Textural Analysis
Textural analysis of the PET/CT images was performed using the freely available software LIFE X (lifexsoft.org version 6.0 May 2020) via the quantitation of various radiomic features based on the spatial arrangement and variation of pixel intensities within a defined volume of interest [
38]. The radiomic features were extracted from the segmented volumes in accordance with the image biomarker standardization initiative (IBSI) guidelines. Primary contour on FDG-avid nodal and extra-nodal lesions was performed semi-automatically by the software (with minor manual correction when needed) using a thresholding method to define each volume of interest (VOI) by two radiologists with >5 years of experience (YE and CO) and supervised by a senior radiologist with >10 years of experience (PVH).
PET volumes of interest (VOI) were defined based on (a) background thresholds, (b) peak thresholds, (c) thresholds at 40%, and (d) thresholds at 70% of the SUVmax PET VOI [
26].
Individual lesions were measured, with a maximum of two per organ when multiple as per RECIST guidelines [
39,
40] and then labeled as nodal or extra-nodal involvement for each specific site (
Figure 1). Lesions smaller than 64 voxels were excluded since they did not fulfill the minimum size criteria for feature extraction by the radiomics software.
Since a thresholding method is not available for the CT component, the contours for the CT-derived volume of interest were performed manually, slice-by-slice, to cover the entire tumor volume as previously described in the literature.
Sixty-five radiomic features (RF) were obtained by the software, including conventional metrics features reporting the mean, median, maximum, and minimum values of the voxel intensities on the image; size and shape histogram-based features such as volume, compacity, and sphericity, including their asymmetry (skewness), flatness (kurtosis), uniformity, and randomness; and textural features (such as GLCM (Gray-Level Co-occurrence Matrix), GLRLM (Grey-Level Run Length Matrix), NGLDM (Neighborhood Grey-Level Different Matrix), GLZLM (Grey-Level Zone Length Matrix).
2.4. Statistical Analysis
In this study, two main outcomes were considered. First, Progression Free Survival (PFS) is defined from the date of diagnosis to the date of first progression (or relapse) or date of death or last follow-up. Events are progression or death. A second endpoint named radiotherapy outcome is defined as the evaluation of whether radiomics at baseline PET can predict the need for radiotherapy after the completion of chemotherapy. The latter endpoint is a binary outcome, where those who received radiation are assigned a value of 1, while those who did not receive radiation are assigned a value of 0.
The characteristics of patients were presented as means and standard deviations for continuous variables and as frequencies and percentages for categorical variables. Univariate and multivariate models were used to determine the role of baseline demographics, clinical and laboratory characteristics, and FDG PET/CT radiomics in predicting the outcome of patients with lymphoma. A binary logistic regression model was used to determine potential risk factors for radiotherapy outcomes and the odds ratios (OR) and 95% confidence intervals (CI) were reported. The Cox proportional hazards regression, on the other hand, was used to determine PFS outcome factors and to calculate hazard ratios (HR) with 95% CI. The initial selection of informative variables for creating the best prediction models was accomplished through univariate analysis and repeated 10-fold cross-validation. Cross-validation was applied to all classes of baseline demographics, clinical, and FDG PET/CT radiomics variables to compensate for the lack of a validating cohort and to decrease the possibility of over-fitting the final model. In both logistic and Cox models, variables with a
p-value of less than 0.10 in the univariate analysis were considered for inclusion in the multivariate analysis, and variables with a
p-value of less than 0.05 were retained in the final model considering the backward elimination method. Pearson correlation is calculated to check the correlation between clinical, PET, and CT radiomic factors. In addition, predictors with high variance inflation factors are excluded from the models to avoid multicollinearity caused by correlated predictors. The average Brier score and the area under the receiver operating characteristic curve (AUC), which indicates the predictive accuracy of a model, were used to determine if the CT and PET variables would improve predictive accuracy over the demographic and clinical risk factors. All statistical analysis was performed in R (version 3.6.3, R Foundation for Statistical Computing,
https://www.R-project.org/ May 2021).
3. Results
3.1. Study Population
Overall, 88 patients, 42 women (48%) and 46 men (52%), with a median age of 43.3 (range 21–85 years), were included.
Initial curative treatment was intended for all the patients. Combined doxorubicin + bleomycin + vinblastine + dacarbazine (ABVD) was the initial therapy of choice in 91% (n = 79) of patients, with 62% (n = 54/88) receiving 6 cycles and 94% of them (n = 82/88) completing therapy as initially planned at tumor boards. Of those, 84% (n = 72/88) achieved CMR.
Overall, 48% (n = 43/88) underwent additional radiotherapy for residual FDG-avid disease or due to initial bulky, disease achieving a complete metabolic response in 95% (n = 41/43).
At a median follow-up of 33.9 months (range 6–65), response to treatment was complete response (CR) in 88% of patients (n = 76), progressive disease (PD) in 8% of patients (n = 7), partial response (PR) in 2% of patients (n = 2), stable disease (SD) in 1% of patients (n = 1) and not evaluated in 2% of patients (n = 2) because of loss of follow-up. There were 10 adverse events during the follow-up period (defined as death, progression based on follow-up CT or PET/CT, or relapse), corresponding to 11.4%.
A summary of the patient population demographics, clinical information, and laboratory results is presented in
Table 1 and
Table 2, respectively.
3.2. Univariate Analysis
The statistically significant results of the univariable Cox regression analysis for CT, PET, and clinical parameters when considering either nodal-only involvement or all sites of disease involvement, as well as correlation with PFS and predictors of radiotherapy, are summarized in
Table 3,
Table 4 and
Table 5, respectively. Of note, only one CT parameter (the GLZLM SZHGE mean) was found to be significant for the prediction of the need for radiotherapy in both categories (nodal vs. all sites), whereas several yet similar parameters (shape and GLRLM) were found to be significant for the prediction of the PFS endpoint. The results for PET showed similar trends: shape, GLRLM, and GLZLM features were found to be significant in all evaluation categories. Interestingly, a rather ‘standard’ feature such as TLG was found to be predictive as well. A summary of bivariate correlation coefficients is presented in
Table 6.
3.3. Multivariate Analysis (MVA) Parameters as Predictors of PFS
Multivariable Cox regression analysis was performed based on significant parameters (
p < 0.1) from univariate analysis (UVA). MVA was performed in a backward manner with a stay criterion of
p < 0.05. The parameters with the lowest
p-value (clinical as well as imaging parameters) were used for model building for the PFS model, including Albumin (
p = 0.034); ALP (
p = 0.028), and CT grayscale parameter grey level run length matrix non-uniformity for a run (GLRLM GLNU mean (HR = 2.52, 95% CI (1.22, 5.18)
p = 0.012)). Significant parameters and Forrest plots are presented in
Table 7 and
Table 8 and
Figure 2 and
Figure 3, respectively. Graph plot are presented in
Figure 4 and
Figure 5.
3.4. MVA Parameters Predictors of Need for Radiotherapy
For the prediction of the need for radiotherapy, a few parameters were significant in the UVA, including advanced stages (Stages III and IV combined). This parameter was however excluded from the MVA given that it was already predefined by the images and could potentially introduce bias. Therefore, first-order PET parameter SHAPE Sphericity (OR = 1.9, 95% CI (1.05, 3.42) p = 0.033); CT parameter grey level zone length matrix high gray-level zone emphasis (GLZLM SZHGE mean (OR = 2, 95% CI (1.08, 3.73), p = 0.028)); PARAMS XSpatial Resampling (OR = 2.1, 95% CI (1.2, 3.68), p = 0.0091); as well as abnormal hemoglobin results (OR + 0.26 (0.09, 0.78), p = 0.016) remain as independent features in the final model for the binary outcome as predictors of radiotherapy (AUC = 0.79).
4. Discussion
In our study, we evaluated the utility of combined PET/CT radiomic features as well as clinical parameters for outcome prediction in patients with Hodgkin’s lymphoma.
So far, only a few radiomics studies have been performed in HL populations addressing outcome prediction, and even fewer have considered clinical parameters as well as combined PET and CT features for outcome prediction. We found that CT as well as PET radiomics combined with clinical parameters might be able to help predict outcome endpoints such as PFS as well as the need for additional radiotherapy.
We found several radiomics parameters from baseline FDG PET/CT to be predictors of survival rate and predictors of the need for radiotherapy in the univariable analysis. However, when multivariable models were designed, considering the parameter with the lowest
p-value for the model building, no PET-related parameter was found to be an independent predictor for PFS. This is concordant with a few earlier studies that evaluated a similar question, including first-order parameters such as SUVmax. For example, Frood et al. [
27] recently published a meta-analysis of baseline PET/CT imaging parameters as a predictor of treatment outcome in Hodgkin and diffuse large B-cell lymphomas (DLBCL). In the meta-analysis, 10 studies assessing SUVmax as a predictor of response are included, however, none of the studies evaluated radiomics features. The largest study, by Akharti et al. [
35], demonstrated that SUVmax could not be applied to predict either PFS or OS in 267 patients. Interestingly, in our study, a CT second-order parameter was a predictor of survival when combined with clinical parameters such as an abnormal albumin level and an elevated ALP.
Driessen, J. et al. [
25] recently presented a radiomics analysis in a larger cohort of patients with relapsed HL. They found that a combination of radiomics and clinical features results in a strong prediction model for 3-year time to progression. The model uses robust PET features that address inter-lesional heterogeneity in the distance, metabolic volume, and SUV but did not include any second or higher-order radiomic features, as compared to our study. In addition, this investigation did not include a radiomics evaluation of the CT component of the PET/CT.
A recent study by Zhou and co-workers evaluated if the radiomic features of baseline FDG PET could predict the prognosis of Hodgkin lymphoma [
36]. They found that long-zone high gray-level emphasis and Dmax were independently correlated with 2-year progression-free survival, although this study did not evaluate complementary CT-radiomics and did not integrate any clinical information into their AUC analysis. Furthermore, they evaluated a smaller number of patients, which were further divided into a training and validation data set, which likely decreased statistical robustness.
Another study has taken a somewhat different approach, evaluating 45 patients receiving R-CHOP (Rituximab+ Cyclophosphamide + Doxorubicin + Vincristine + Prednisone) chemotherapy for DLBCL evaluating the ability to predict therapy response [
37]. Here, the authors concluded that SUV
max and gray-level co-occurrence matrix dissimilarity were independent predictors of lesions with an incomplete response.
Milgrom, S.A. et al. [
29] analyzed a cohort of 251 mediastinal HL patents using another freely available software (IBEX). They found that the first-order parameters MTV and TLG are associated with disease progression in HL. None of the second-order parameters were predictors of progression in their cohort either.
Lue et al. [
24] investigated 11 first-order and, 39 higher-order features in 42 patients with HL to predict PFS and OS. With 21 events in the cohort (12 relapses, 9 deaths), it was demonstrated that SUV, kurtosis, stage, and intensity non-uniformity (INU) derived from the grey-level run length matrix (GLRLM) were independent predictors of PFS, and only disease stage and INU derived from the GLRLM were independent predictors of OS.
Overall, compared to the relatively sparse, directly comparable literature, in our study, none of the PET-derived radiomic features were found to be independent features in the MVA for the PFS outcome. Since several PET-radiomic features were found to be significant in the UVA if we had evaluated only PET radiomic features, it might be that those parameters would have been significant in the MVA, and therefore, we would have more comparable results to other studies. However, in our investigation, PET radiomics parameters were ‘outperformed’ by the CT-radiomic features (which consequently ended up in the MVA) and were therefore not directly compared to the available studies. We feel that, since PET/CT is a hybrid imaging modality in clinical routine, both components (the PET and the CT) should be evaluated in a complementary fashion, and as demonstrated, there appears to be value in CT-derived textural features as well.
However, in our cohort, a PET first-order parameter, SHAPE Sphericity, and the CT second-order features, GLZLM SZHGE mean and PARAMS XSpatial Resampling, were independent predictors for the need for radiotherapy when combined with lower hemoglobin result at baseline lab work (AUC = 0.79) which again underlines the values for combined radiomic evaluation of PET and CT. It has to be pointed out that the clinical decision to apply additional radiotherapy is often multifactorial and that not only one clinical scenario indicates the need for radiotherapy in HL patients. In our institution, this decision is made mostly following the H10 trial [
41], considering whether radiotherapy or a combined modality will be more beneficial for the early stages of disease in a personalized approach that is decided in most of the cases at multidisciplinary rounds by consensus. Individual factors such as the size of the radiation field and the organs included or in the vicinity, gender, age, and the risk of toxicity are all weighted factors in that decision.
Based on our analysis, the integration of combined CT and PET radiomics features might be of further guidance/help in deciding which patients might benefit from additional radiotherapy for the improvement of their disease outcome. Further studies have been addressing the same dilemma, including that of Picardi et al., who evaluated the correlation of histologically proven residual disease at the end of chemotherapy using PET/CT, showing a Deauville score of 4 foci after completion of the first line of chemotherapy [
42].
Similar to other studies cited above, only first-order and morphologic PET radiomic features were found to be significant and, thus, not necessarily intrinsically related to voxel characteristics. For CT, however, two second-order features were found to be of value (i.e., GLZLM SZHGE). As for the comparative literature, no other studies evaluated predictors for the need for radiotherapy besides the bulkiness of the tumor, and therefore this finding may open a window for further analysis in larger cohorts.
Several other new studies have evaluated different aspects of radiomics, i.e., in MRI or PET, but those studies concentrated on the technical aspects of the analysis itself rather than the ability of radiomics for prediction. In addition, PET/CT radiomics has been thoroughly evaluated in non-Hodgkin lymphoma and in the context of prediction for bone marrow involvement, but as this was done for follicular lymphoma, these studies are not specifically relevant for HL patients [
43,
44,
45,
46].
Concerning the integration of clinical parameters, it has been shown in the literature that ALP is not necessarily a predictive clinical parameter on its own. While that is certainly valid from a dedicated clinical perspective, in our cohort it has been found to have predictive value in conjunction with the here evaluated imaging features. Thus, the integration of combined PET and CT radiomic features may elevate the value of specific clinical parameters when evaluated in conjunction.