Baseline 18F-FDG PET/CT Radiomics in Classical Hodgkin’s Lymphoma: The Predictive Role of the Largest and the Hottest Lesions

This study investigated the predictive role of baseline 18F-FDG PET/CT (bPET/CT) radiomics from two distinct target lesions in patients with classical Hodgkin’s lymphoma (cHL). cHL patients examined with bPET/CT and interim PET/CT between 2010 and 2019 were retrospectively included. Two bPET/CT target lesions were selected for radiomic feature extraction: Lesion_A, with the largest axial diameter, and Lesion_B, with the highest SUVmax. Deauville score at interim PET/CT (DS) and 24-month progression-free-survival (PFS) were recorded. Mann–Whitney test identified the most promising image features (p < 0.05) from both lesions with regards to DS and PFS; all possible radiomic bivariate models were then built through a logistic regression analysis and trained/tested with a cross-fold validation test. The best bivariate models were selected based on their mean area under curve (mAUC). A total of 227 cHL patients were included. The best models for DS prediction had 0.78 ± 0.05 maximum mAUC, with a predominant contribution of Lesion_A features to the combinations. The best models for 24-month PFS prediction reached 0.74 ± 0.12 mAUC and mainly depended on Lesion_B features. bFDG-PET/CT radiomic features from the largest and hottest lesions in patients with cHL may provide relevant information in terms of early response-to-treatment and prognosis, thus representing an earlier and stronger decision-making support for therapeutic strategies. External validations of the proposed model are planned.


Introduction
Hodgkin's lymphoma (HL) is a rare B-cell malignancy with an estimated incidence of 2.3 to 2.6 cases per 100,000 people per year [1,2]. The so-called "classical" HLs (cHL) represent the vast majority of HL cases (about 95%) and are distinguished from nodular lymphocyte-predominant types due the indolent presentation and more favorable prognosis of the latter [1,3]. Nowadays, 18 F-FDG PET/CT plays a central role in the management of this disease. Baseline 18 F-FDG PET/CT (bPET/CT) enables patients' risk stratification [1,[4][5][6], as five-year relative survival rate relies strongly on disease stage [2] but also allows therapeutic planification [3,5]. Interim PET/CT (iPET/CT), performed after two to four courses of primary chemotherapy (PCT), has proven its usefulness, as numerous trials have used an iPET/CT response-adapted approach to evaluate early escalation or de-escalation of therapy [7]. In particular, the 5-point Deauville score (DS) derived from iPET/CT has proven high prognostic efficacy [8,9].
The extraction of parameters and the construction of radiomic models is a long and complex process requiring a rigorous approach. Several methodological frameworks have been proposed [16,24]. The variability of target volume delineation protocols, the lack of validation cohorts, and the lack of methodological harmonization are factors that limit the diffusion of this type of approach in clinical routine [13,25,26].
This study aimed at investigating whether bPET/CT radiomic models, which are derived from two distinct and easy-to-identify target lesions (the largest and the hottest), could predict tumor aggressiveness and patients' prognosis considering early response to PCT (DS at iPET/CT) and progression-free survival (PFS) in a large monocentric cohort of cHL patients. In addition, inter-scanner performance differences were assessed.

Study Design, Patients, and Data Collection
This retrospective study was approved by the Ethical Committee of Fondazione Policlinico Universitario A. Gemelli IRCCS (study code 3834), and all included subjects signed an informed consent form.
Medical records of all patients consecutively diagnosed with HL and referred to the hematology unit between September 2010 and October 2019 were reviewed. Patients were included if they had undergone a bPET/CT and an iPET/CT after the first two cycles of PCT and had an available clinical follow-up of at least 2 years. Exclusion criteria were LPHL histology, presence of other synchronous/metachronous tumors, extensive surgical resection of disease for diagnostic purposes before bPET/CT, and first evaluation at disease relapse.

Image Acquisition Protocol
PET/CT studies were acquired according to European Association of Nuclear Medicine guidelines [27]. Patients fasted for ≥6 h, and their blood glucose levels were <200 mg/dL before administration of 236 ± 50 MBq of 18 F-FDG. Images were acquired after 60 ± 10 min of uptake time using a Gemini GXL (Philips Healthcare, Cleveland, OH, USA) or a Biograph mCT (Siemens Healthineers, Erlangen, Germany) PET/CT scanner, applying the respective standard reconstruction protocol (Table 1) [27].

Image Segmentation and Radiomic Features' Extraction
PET/CT images were reviewed by two experienced nuclear medicine physicians blinded to patients' clinical and follow-up data.
For target lesion contouring, a semiautomatic gradient-based segmentation tool (PET Edge , version 7.0.5 of MIM Encore Software Inc., Cleveland, OH, USA) [28,29] was used to delineate volumes of interest (VOIs) on two distinct nodal lesions for each bPET/CT scan ( Figure 1). Target Lesion_A was the lesion with the largest axial diameter (D max ) identified on CT images. When a bulky tumor was present, Lesion_A was identified as the single distinct lesion with D max among contiguous lesions in the bulk. Target Lesion_B was the lesion with the highest SUV max . When more than one lesion visually showed similar 18 F-FDG uptake, a VOI was drawn around each one to choose the hottest. The conventional parameters D max , SUV max , SUV mean , and metabolic tumor volume (MTV) at 40% of SUV max threshold (MTV 40 ) were extracted for each VOI. No manual adjustment was added to the segmentation process.
To add the total metabolic tumor volume (TMTV) to the conventional parameters described above, a total-body PET segmentation tool (LesionID, version 7.0.5 of MIM Encore Software Inc., Cleveland, OH) was applied to each bPET/CT scan. As described in [30], the program workflow firstly used a PET Response Criteria in Solid Tumors (PERCIST)based background threshold (liver) to identify all lesions with higher uptake, then applied a fixed relative threshold of ≥41% of the SUV max of each VOI to create the boundaries of the metabolically active region within each lesion. Physicians were required to reject false-positive lesions (sites of physiological uptake, external contamination, and pathologic uptake deemed lymphoma-unrelated) before all approved VOIs were computed to obtain TMTV [30,31].
For the two target lesions, a rich set of additional radiomic features was extracted using Moddicom [32], an open-source software library in R [33] and Image Biomarker Standardization Initiative (IBSI)-compliant [34]. Moddicom's image features belonged to the following IBSI classes: morphological, intensity-based statistical, intensity-histogram, grey-level co-occurrence matrix (GLCM), grey-level run-length matrix (GLRLM), and greylevel size-zone matrix (GLSZM) [34] (Supplemental Table S1). No spatial interpolation or kernel-based filter application to the images was needed before running them in the software due to the homogeneous geometry in the DICOM series. For the two target lesions, a rich set of additional radiomic features was extracted using Moddicom [32], an open-source software library in R [33] and Image Biomarker Standardization Initiative (IBSI)-compliant [34]. Moddicom's image features belonged to the following IBSI classes: morphological, intensity-based statistical, intensity-histogram, grey-level co-occurrence matrix (GLCM), grey-level run-length matrix (GLRLM), and grey-level size-zone matrix (GLSZM) [34] (Supplemental Table S1). No spatial interpolation or kernel-based filter application to the images was needed before running them in the software due to the homogeneous geometry in the DICOM series.

Statistical Analysis and Radiomic Models
For early response-to-treatment assessment, complete metabolic response corresponded to iPET/CT DS 1-3, while DS 4-5 was associated to partial/no metabolic response [5]. Twenty-four-month PFS was defined as the interval between histological diagnosis of cHL and the first clinical detection of progression during treatment, treatment escalation, and lack of complete remission after PCT or disease relapse. Figure 2 shows the statistical workflow employed. Briefly, a Mann-Whitney test was performed to identify the most promising image features (p < 0.05) from Lesion_A and Lesion_B with regards to DS (<4 or ≥4) and 24-month PFS ("no event" or "event" at 24 months). Among the statistically significant features, the first 60 were used to build all possible bivariate models through logistic regression (LR) analysis. LR bivariate models were trained/tested with a cross-fold validation test (training set vs. testing set: 80% vs. 20%, 20 repetitions). The best models were then selected on the base of receiver operating

Statistical Analysis and Radiomic Models
For early response-to-treatment assessment, complete metabolic response corresponded to iPET/CT DS 1-3, while DS 4-5 was associated to partial/no metabolic response [5]. Twenty-four-month PFS was defined as the interval between histological diagnosis of cHL and the first clinical detection of progression during treatment, treatment escalation, and lack of complete remission after PCT or disease relapse. Figure 2 shows the statistical workflow employed. Briefly, a Mann-Whitney test was performed to identify the most promising image features (p < 0.05) from Lesion_A and Lesion_B with regards to DS (<4 or ≥4) and 24-month PFS ("no event" or "event" at 24 months). Among the statistically significant features, the first 60 were used to build all possible bivariate models through logistic regression (LR) analysis. LR bivariate models were trained/tested with a cross-fold validation test (training set vs. testing set: 80% vs. 20%, 20 repetitions). The best models were then selected on the base of receiver operating characteristic (ROC) curves, mean area under the ROC curves (mAUC), and SD to the normal. Moreover, the same statistical workflow was applied to analyze radiomic data separately for each scanner (Scanner_1: Gemini GXL; Scanner_2: Biograph mCT).
normal. Moreover, the same statistical workflow was applied to analyze radiomic data separately for each scanner (Scanner_1: Gemini GXL; Scanner_2: Biograph mCT).

Patients' Characteristics
Among the 247 patients with cHL referred to the hematology unit between 2010 and 2019, 227 fulfilled the inclusion criteria and were included in the study (Figure 3). Patients' characteristics are reported in Table 2. Disease stage was limited (I/II) in 51.5% (117/227) patients and advanced (stage III/IV) in 48.5% (110/227). Mean follow-up time was 56 months (range, 2-127). Adverse events at 2 years from bPET/CT were recorded in 46 patients. The 24-month PFS was 78.46%.

Patients' Characteristics
Among the 247 patients with cHL referred to the hematology unit between 2010 and 2019, 227 fulfilled the inclusion criteria and were included in the study (Figure 3). Patients' characteristics are reported in Table 2 normal. Moreover, the same statistical workflow was applied to analyze radiomic data separately for each scanner (Scanner_1: Gemini GXL; Scanner_2: Biograph mCT).

Patients' Characteristics
Among the 247 patients with cHL referred to the hematology unit between 2010 and 2019, 227 fulfilled the inclusion criteria and were included in the study (Figure 3). Patients' characteristics are reported in Table 2 Table 3 shows bPET/CT conventional parameters extracted from Lesion_A and Le-sion_B (D max , SUV max , SUV mean , and MTV 40 ) and TMTV. Their statistical correlation with the outcomes is also itemized. For DS prediction, TMTV, Lesion_A_D max , and Lesion_A_MTV 40 were among the first 60 significant features at Mann-Whitney test in the univariate phase of the radiomic computational pipeline (Figure 2). Among the first 60 most relevant features for 24-month PFS prediction, the conventional PET/CT parameters were Lesion_A_D max , Lesion_A_SUV max , Lesion_A_SUV mean , Lesion_A_MTV 40 , Lesion_B_D max , Lesion_B_SUV max , Lesion_B_SUV mean , Lesion_B_MTV 40 , and TMTV. Figure 4 shows the anatomical locations of Lesion_A and Lesion_B among all patients. The mediastinum was the most frequent site for both lesions (Figure 4)

Radiomic Models
The first two best bivariate models at cross-fold validation test for DS prediction are shown in Figure 5A, with a maximum mAUC of 0.78 ± 0.053 in the model combining a GLCM correlation feature from Lesion_A and the GLCM joint entropy feature from Lesion_B. Overall, Lesion_A concurred predominantly in the best bivariate models with features from several IBSI classes, while Lesion_B scantly contributed and almost only with the "entropy" feature from the GLCM class (Supplemental Table S2).  The first two best bivariate models for 24-month PFS prediction are shown in Figure 5B, with the maximum mAUC (0.74 ± 0.12) found in the combination of TMTV and a Lesion_B GLRLM feature. In this case, the overall best combinations saw no contribution from Lesion_A radiomic features. Lesion_B instead, especially combined with TMTV, was the most representative, with features belonging to numerous IBSI classes (Supplemental Table S3).

Scanner-Based Radiomic Models
At bPET/CT, 119/227 patients (52.4%) were scanned on Scanner_1 and 108/227 (47.6%) on Scanner_2. For DS prediction, Scanner_1 was the best-performing for every relevant feature. The best bivariate radiomic models came from the combination of Lesion_A features belonging to different IBSI classes, more often GLCM-related ones (Supplemental Table S4), with a minor contribution of TMTV. The best mAUC (0.95 ± 0.06) was obtained combining Lesion_A entropy and autocorrelation features from GLCM class ( Figure 6A). The best bivariate radiomic models for 24-month PFS prediction were found with features extracted from Scanner_2 images ( Figure 6B, maximum mAUC: 0.87 ± 0.14), with features almost all belonging to Lesion_B (Supplemental Table S5).
Diagnostics 2023, 13, x FOR PEER REVIEW 9 of 14 Figure 5. First two best bivariate radiomic models at cross-fold validation test for DS prediction (A) and for 24-month PFS prediction (B) in the overall cohort. The continuous blue line represents mAUC. The dotted blue lines represent +/− standard deviation.

Scanner-Based Radiomic Models
At bPET/CT, 119/227 patients (52.4%) were scanned on Scanner_1 and 108/227 (47.6%) on Scanner_2. For DS prediction, Scanner_1 was the best-performing for every relevant feature. The best bivariate radiomic models came from the combination of Le-sion_A features belonging to different IBSI classes, more often GLCM-related ones (Supplemental Table S4), with a minor contribution of TMTV. The best mAUC (0.95 ± 0.06) was obtained combining Lesion_A entropy and autocorrelation features from GLCM class ( Figure 6A). The best bivariate radiomic models for 24-month PFS prediction were found with features extracted from Scanner_2 images ( Figure 6B, maximum mAUC: 0.87 ± 0.14), with features almost all belonging to Lesion_B (Supplemental Table S5). Figure 6. First two best bivariate radiomic models at cross-fold validation test for DS prediction with data obtained from Scanner_1 images (A) and for 24-month PFS prediction with data obtained Figure 6. First two best bivariate radiomic models at cross-fold validation test for DS prediction with data obtained from Scanner_1 images (A) and for 24-month PFS prediction with data obtained from Scanner_2 images (B). The continuous blue line represents mAUC. The dotted blue lines represent +/− standard deviation.

Discussion
This study shows that bivariate models of bPET/CT radiomic features primarily from the largest lesion (Lesion_A) likely foresee cHL patients' response to early evaluation during PCT (DS at iPET/CT), while bivariate models of radiomic features primarily from the hottest lesion (Lesion_B) seem to predict patients' long-term outcome (PFS). The dichotomy became even more evident, and radiomic models also had higher prognostic power when analyzing image data divided by scanner. Conceivably, there may be a different grade of strength in how a feature can provide informative contents with regards to a specific outcome depending on the underlying scanner technology. Indeed, bivariate models of mainly morphology-related radiomic features belonging to Lesion_A (the largest) had higher significance for DS prediction when extracted from Scanner_1 (Philips Gemini GXL). A possible explanation is that the morphology of the lesions might have been influenced by PSF modeling, TOF, and image-smoothing applied during image reconstruction on Scanner_2 (Siemens Biograph mCT) but not on Scanner_1. Instead, for PFS prediction, bivariate analysis was successful on features belonging to Lesion_B (the hottest) and on Scanner_2. Interestingly, the best models often combined features from GLSZM and GLRLM classes, which deal with grey-level discretization [34]. Scanner_2 has a more upto-date technology compared to Scanner_1, and its higher spatial resolution may explain our results.
Besides the two target lesions, significant prognostic power was also provided by TMTV for both DS and PFS. This finding is in line with other studies [35,36]. However, an applicable threshold has not currently been identified for TMTV as for other radiomic features, and calculation methodology has varied among studies [12,13,37,38], preventing its use in clinical routine settings [13,16]. In our study, a gradient-based algorithm for target volume segmentation was employed, considering its known higher accuracy and consistency compared to constant-threshold or manual contouring (especially for lesions <2 cm) [28,39]. TMTV instead was assessed using a fixed relative threshold method [37], which in a single automatic program workflow has been previously described [38,40,41] and deemed to allow fast, reproducible, and practical calculations in patients with disseminated disease [38]. However, recent investigations using different scanners and segmentation methods in both HL and non-HL have interestingly concluded that calculations from different contouring techniques generate similar results [38,42,43].
To our knowledge, this is the first study in the literature proposing a reasonable trade-off between previously suggested numbers of target lesions and methodological approaches, and we believe it could draw a feasible path towards application in clinical practice. Previous reports have described correlation between tumoral bulks and DS [19,44], high SUV max and tumor aggressiveness [45][46][47], radiomics from one/several/all lymphoma lesions, and prognosis in univariate/multivariate analyses, though mostly in small cohorts and with no model validation [12,13].
However, the present study has some limitations. The first is its retrospective, singlecenter nature, which influenced population numerosity. Even if quite a numerous cohort was recruited compared to other cHL studies in the literature, we recognize it is still limited for our results to be generalizable. Moreover, this characteristic limited the possibility of performing per-stage and per-histological subtype sub-analyses. On the one hand, the heterogeneity of our population may be seen as representative of a real-life clinical scenario and inclusion-bias avoidance, with proportions of the distribution of disease stage percentages matching the ones of the US/EU population, with stage II being the more frequently presented at diagnosis, followed by stages IV, III, and I. On the other hand, histology and stage at diagnosis are known to impact patients' response to therapy and survival [1,2,[4][5][6] and will therefore need to be taken into account in further larger studies. We also limited our analysis to PET parameters, overlooking CT parameters other than D max [17,48]. All volumes corresponding to the chosen criteria for radiomic target lesions were analyzed regardless of size, but the use of an IBSI-compliant platform that succeeded analyzing small volumes may be rather seen as an overcoming of what was a challenge in other studies [43]. The use of images acquired from different scanners and the lack of resampling or harmonization of the acquired images may be considered as further limitations [49]. Conversely, these traits combined with avoidance of manual adjustments to the image contouring method proposed may allow time saving, consistency, and generalization of our results to other institutions for a desirable external validation, which would help in strengthening and anticipating the decision-making process in cHL patients' treatment by combining the prognostic power of two target lesions at bPET/CT. Multicenter studies with a larger and more homogeneous cohort may allow for stronger results and comparison with conventional clinical prognostic models.

Conclusions
This large monocentric retrospective study on cHL patients offers a broader insight into baseline PET/CT and shows that bivariate radiomic models from the largest and the hottest lesions provide significant information about patients' outcome (DS in iPET/CT and PFS, respectively). TMTV has prognostic significance for both DS and PFS. Bivariate models of higher prognostic power were found when the underlying scanner technology was considered, unveiling possible image morphological distortions following appliance of multiple reconstruction algorithms.
Further studies including correlations with clinical parameters and external validation of our proposed model are auspicial.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/diagnostics13081391/s1, Table S1: Radiomic features classes and their components, extracted from Lesion_A and Lesion_B through Moddicom software; Table S2: Best bivariate radiomic models for Deauville score prediction (whole cohort); Table S3: Best bivariate radiomic models for 24-month progression-free survival prediction (whole cohort); Table S4: Best bivariate radiomic models for Deauville score prediction (Scanner_1 images); Table S5: Best bivariate models for 24-month progression-free survival prediction (Scanner_2 images). Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.