CT Radiomics and Clinical Feature Model to Predict Lymph Node Metastases in Early-Stage Testicular Cancer

,


Introduction
Testicular germ cell tumours (TGCTs) are the most common malignancy among men aged 15-40 [1,2]. Its characteristic patient population and high cure rate make this disease unique, constituting one of the few success stories in cancer care [3,4]. Besides cure, reducing the amount of therapy-related acute and long-term toxicity is the goal of care due to the young age of the TGCT patients and the long life expectancy following curative therapy [5][6][7][8][9][10]. The main risk factors for TGCTs include cryptorchidism, family or personal history of TGCT and contact with organochlorine compounds [11,12]. TGCTs are classified histologically into seminoma and non-seminoma, including pure non-seminoma and mixed germ cell tumours, with seminoma accounting for approximately 55% of all cases with an average age at diagnosis in the fourth decade of life, about eight years later than nonseminoma [12]. TGCT are diagnosed by physical examination, testicular ultrasound and specific tumour markers, such as alpha-fetoprotein (AFP), beta-hCG (β-hCG) and lactate dehydrogenase (LDH) [13,14].
Ninety-five percent of all metastases from TGCTs involve the ipsilateral retroperitoneal lymph nodes. Thus the present German guidelines recommend that in early-stage seminoma, patients with certain criteria, such as a tumour with a diameter >4 cm, an adjuvant therapy be applied, consisting of either one to two cycles of carboplatin or radiotherapy of the paraaortic region with 20 Gy [15]. However, retroperitoneal lymph node dissection (RPLND) is the only treatment modality to correctly stage the nodal status of early testicular cancer. Unfortunately, due to the short-and long-term complications, such as retrograde ejaculation, the implementation of adjuvant chemotherapy regimens, and the excellent prognosis with surveillance approaches in stage I disease, RPLND plays a negligible role as the primary treatment of early-stage TGCTs [16]. The most commonly used tumour markers, AFP, β-HCG, and LDH, are not very specific and are present in only about 60% of men with testicular cancer [14,17]. Worse, some conditions lead to false-positive elevation of testicular markers, such as liver disease or genetic reasons [18].
Due to its exceptional spatial resolution, CT imaging is regarded as well suited for identifying pathologically enlarged lymph nodes; in clinical practice, a short axis larger than 7-8 mm is considered pathologic (AUC with a sensitivity and specificity approaching 70%) [19]. Nonetheless, CT cannot distinguish between affected and normal lymph nodes in small lymph nodes [20].
Suboptimal therapeutical management, however, jeopardises the excellent outcomes of TGCT patients, with either over-or undertreatment being equally harmful.
Advanced medical imaging integrating high-resolution image acquisition, powerful computational technologies and artificial intelligence (AI)-based image analysis enabled researchers to develop the field of radiomics [21,22]. This way, data characterisation algorithms can detect specific diagnostic image patterns and convert them into quantitative mineable "big data" [23,24].
In the era of precision medicine, AI-based image analysis addresses the challenges of biopsy with the advantages of being non-invasive, repeatable, and applicable to hard-toreach lesions within the body by analysing texture features of a region of interest (ROI) that reflect tumour physiology and radiologic phenotype according to current data [25,26].
Many studies have evaluated the diagnostic potential of radiomics for classifying lymph nodes in different cancer types, including gastric, rectal, and bladder cancer, with promising results [27][28][29][30]. AI-based advanced imaging could provide new imaging biomarkers or radiomic signatures to combat the urgent problem of under-or overtreatment of TGCT patients.
Our study is the first to investigate computed tomography (CT) radiomics models integrating clinical risk factors for the individualised prediction of lymph node metastasis in patients with early-stage TGCT, thus promoting precision imaging in clinical oncology.
Based on the findings above, we hypothesised that: (1) The radiomics features extracted from retroperitoneal lymph nodes might potentially predict TGCT recurrence. (2) Integrating important clinical factors, including age, histotype, AFP, ß-HCG, and BMI, into a combined clinical-radiomics model might add an incremental value to predict TGCT recurrence.

Patients and Imaging Protocol
Ninety-one treatment-naive patients with surgically proven stage I TGCT who underwent contrast-enhanced CT scans at our institution between January 2006 and December 2016 were included in this retrospective study.
Patient demographic, laboratory and clinical data were collected through a careful review of electronic medical records and the radiology information system. Exclusion criteria included incomplete clinical or imaging records and no histologic confirmation after surgery.
The primary endpoint of our study was retroperitoneal LN metastases from TGCT based on subsequent clinical and imaging examinations determined from records in electronic medical records.
Of the 167 patients originally screened, 91 could be included in the final study cohort according to the selection criteria. The patients in the final study cohort were followed up for at least six years after orchiectomy.
A flowchart of the cohort selection is shown in Figure 1. CT scans were conducted before orchiectomy (+/−2 weeks) (mean time 3 ± 11 days, range 2-24) to determine disease status. Images were obtained as part of the routine staging on the Philips Brilliance CT 16-channel multi-row detector CT or Philips Brilliance CT 64-channel scanner (Philips Healthcare, Cleveland, OH, USA). CT scans were performed  CT scans were conducted before orchiectomy (+/−2 weeks) (mean time 3 ± 11 days, range 2-24) to determine disease status. Images were obtained as part of the routine staging on the Philips Brilliance CT 16-channel multi-row detector CT or Philips Brilliance CT 64-channel scanner (Philips Healthcare, Cleveland, OH, USA). CT scans were performed using acquisition and reconstruction parameters by the standard protocol after intravenous contrast injection of Ultravist ® 370 (Bayer Schering Pharma, Berlin, Germany) at a weightmatched dose with a delay of 70-80 s for the portovenous phase of the chest and abdomen (tube voltage 100 kV-120 kV with automatically calculated tube current, matrix of 512 × 512, in-plane resolution between 0.62 × 0.62 mm and 0.86 × 0.86 mm, section thickness of 2.0-5.0 mm). Using two different CT scanners, a heterogeneous data set was generated to represent a routine clinical scenario as well as possible.

Segmentation and Radiomic Feature Extraction
First introduced by Haralick et al. in 1973 [31], image feature extraction, such as histogram features or features from the co-occurrence matrix, has demonstrated eminent potential in various questions in different cancers [22,32].
Three-dimensional region-of-interest segmentation, texture analysis, and feature extraction were conducted using mint Lesion™ software (version 3.8.4, mint Medical GmbH, Heidelberg, Germany). Details of the extraction settings are given in Appendix A, Table A1. The schematic diagram for ROI segmentation and feature extraction for model development is shown in Figure 2. Eighty-five imaging features were extracted from each ROI: features related to the 3D size and shape, first-order statistics characterising the distribution of voxel intensities within the selected region, and features relating to the grey-level co-occurrence matrix (see Tables A2 and A3 in Appendix A).

Feature Selection and Development of the Predictive Radiomics Model
Analogous to other data mining applications, radiomics extracts many texture features from the regions of interest [33].
For more generalisable, powerful, and faster modelling and reduced overfitting, we selected optimal features using the logistic regression model with the smallest absolute shrinkage and the selection operator (lasso) [34,35]. Each feature had an associated covariate coefficient. With a continuous increase in λ-value, some regression coefficients continuously declined and tended to 0. The remaining variables with non-zero values were chosen as the best-performing predictors. The optimal hyperparameter λ = 0.001 was found by grid search [36,37].
Multivariable logistic regression developed the most appropriate radiomics model by using the selected radiomic features as the input variables to classify between the binary output variables.
Patients with LN metastases within the 6-year observation period were assigned to  Table A1.
Two board-certified radiologists, with over 10 years of experience in oncologic imaging and over 8 years' experience in texture analysis, analysed all images.
Three retroperitoneal lymph nodes along the infrarenal aorta were segmented per patient, resulting in 273 eligible samples randomly divided into a training set (n = 191) and a testing set (n = 82) at a ratio of 70:30.
Radiomic features were quantified regarding their distinctive pattern of grey levels within the ROI using texture feature descriptors according to the Image Biomarker Standardization Initiative (IBSI) guidelines [24].
Eighty-five imaging features were extracted from each ROI: features related to the 3D size and shape, first-order statistics characterising the distribution of voxel intensities within the selected region, and features relating to the grey-level co-occurrence matrix (see Tables A2 and A3 in Appendix A).

Feature Selection and Development of the Predictive Radiomics Model
Analogous to other data mining applications, radiomics extracts many texture features from the regions of interest [33].
For more generalisable, powerful, and faster modelling and reduced overfitting, we selected optimal features using the logistic regression model with the smallest absolute shrinkage and the selection operator (lasso) [34,35]. Each feature had an associated covariate coefficient. With a continuous increase in λ-value, some regression coefficients continuously declined and tended to 0. The remaining variables with non-zero values were chosen as the best-performing predictors. The optimal hyperparameter λ = 0.001 was found by grid search [36,37].
Multivariable logistic regression developed the most appropriate radiomics model by using the selected radiomic features as the input variables to classify between the binary output variables.
Patients with LN metastases within the 6-year observation period were assigned to the high-risk group, whereas those with complete remission were classified in the low-risk group.
To handle the imbalance between LN metastases (negative vs. positive, 81/10) and avoid bias toward majority class cases to achieve a high classification rate, we applied the synthetic minority over-sampling technique (SMOTE) to the training cohort. SMOTE is an approach in which the minority class is over-sampled by creating "synthetic" examples rather than over-sampling with replacement. Thus, more related minority class samples to learn from are provided, allowing the learner to carve broader decision regions, leading to more coverage of the minority class limitations [38]. For greater generalisability of our results, we performed a stratified 10-fold cross-validation on the under-sampled data in all experiments to train and test the model resulting in a train and test partition of 90% and 10%, respectively, for each fold. We performed patient-specific splits to ensure that each patient's lymph nodes remained together in either the training or test set. We reported the mean and standard deviation of the area under the ROC-curve, accuracy, precision, recall, and F1-sore over the test set results of the ten runs. Furthermore, receiver operating characteristic (ROC) curves were plotted for each cohort. To ensure that our model was more than just a complicated surrogate for volume, we ran our experiments using only Volume and Mean Intensity as input features.
The correlation coefficients and constant of the model were computed (Figure 3, Appendix A, Figure A1). It is worth mentioning that the feature selection and the model construction were all from the date of the training cohort.
Discrimination performance was assessed by the Harrell concordance index (C-index). The feature selection and the construction of the radiomics signature model were performed using our in-house software programmed with the Python Scikit-learn package (Python version 3.10, Scikit-learn version Scikit-learn 0.23.3, http://scikit-learn.org/) [36,39].
The features IMAD (Intensity Median Absolute Deviation) and GCS (GLCM Cluster Shade) use the secondary axis; all other features use the primary axis.
The The correlation coefficients and constant of the model were computed (Fi pendix A, Figure A1). It is worth mentioning that the feature selection and the struction were all from the date of the training cohort.
Discrimination performance was assessed by the Harrell concordance in dex).
The feature selection and the construction of the radiomics signature m performed using our in-house software programmed with the Python Scikitage (Python version 3.10, Scikit-learn version Scikit-learn 0.23.3, http://scikit [36,39].

Development of the Clinical and the Combined Prediction Models
The clinical factors included in our analysis were age, AFP level, B-HCG level, histotype (seminoma and non-seminoma), and body mass index (BMI). These factors were included as they have all been suggested to be prognostic in TGCT [40][41][42][43].
Our study included purely clinical and laboratory chemistry parameters to represent a real-life scenario for the individualised preoperative prediction of LNM at the time of the CT scan.
The selected clinical features and their relationship to lymph node metastasis were assessed with a univariable logistic regression algorithm in the training set. Variables with p < 0.2 from the univariable analysis were included for further application in a multivariable logistic regression algorithm using forward stepwise selection. A cutoff value of 0.25 is supported by the literature [44,45].
Then, multivariable logistic regression analysis built three prediction models-a radiomics-only model, a clinical-only model and a combined clinical-radiomics model, incorporating the selected radiomics and clinical features.
Their predictive performance for detecting LN metastasis was assessed using the receiver operating characteristic curve (ROC) analysis, in which the areas under the curve (AUC), accuracy, precision, and F1-Score were established.
The clinical utility was demonstrated by decision curve analysis (DCA) to evaluate the net benefits of the prediction models at different threshold probabilities in the training cohort and compare their discriminatory performance.

Clinical Features
The study flowchart is presented in Figure 1. Ninety-one consecutive patients with histologically-proven TGCT (mean age 35.2 ± 9.4 years, range 18-63) met the criteria for participation in the study. In this cohort, 10 patients (9.1%) relapsed within the six-year observation period (mean 9.8, 35.2 ± 9.4 years, range 18-63); there were no statistically significant differences in clinical characteristics between the LNM-positive group and LNM-negative group. After univariable LR analysis, age, AFP level, B-HCG level, histotype, and body mass index (BMI) were independent predictors in the clinical model.
All patients' baseline clinical characteristics are summarised in Table 1. In total, the dataset consisted of 273 sample instances (three LN ROIs/patient), with 33 instances in the category "relapse of disease" (minority class) and 240 instances in the category "without relapse of disease" (majority class). According to a proportion of 7:3, the 273 sample instances were randomly divided into a training cohort (n = 191) and a test cohort (n = 82).
Due to the class imbalance in the dataset, the under-sampling technique called "Instance Hardness Threshold" was used to balance the data. The balanced data were used for the logistic regression machine learning mode.

Feature Selection and Performance of the Radiomics Prediction Model
A total of 85 radiomics features were extracted from the venous-phase CT images of the training cohort (Appendix A, Table A2). After screening these features, we chose the 12 radiomics features that had non-zero coefficients using the LASSO logistic regression model as the best-performing predictors for LN metastasis (Figure 3; Appendix A, Table A3).
These features were used as input volume for the machine learning-based radiomics modelling. Traditional measurements of machine learning-based modelling were used, including accuracy, precision, F1-Score, and the area under the ROC curve (AUC), to assess the performance of predicting lymph node metastases.
All tests were two-sided; p < 0.05 was considered statistically significant.
In the ROC analysis of the radiomics model, the classification evaluation metrics of the 10-fold cross-validation were AUC 0.84 ± 0.17, accuracy 0.76 ± 0.12, precision 0.80 ± 0.18, recall 0.72 ± 0.23, and F1 score 0.73 ± 0.17 in the training cohort (Table 2). Using only Volume and Intensity Mean as input features led to inferior results with an accuracy of 0.58 ± 0.16, with a precision and recall of 0.11 ± 0.07 and 0.43 ± 0.27, respectively.

Performance of the Clinical and the Combined Prediction Model
The clinical-only and combined clinical-radiomics models were built by applying multivariable logistic regression analysis.
The predictive performances of the radiomics-only, the clinical-only and the combined clinical-radiomics models on the training cohort are shown in Table 2.
The combined clinical-radiomics model showed the best prediction accuracy with 90% (AUC 0.94-0.10), indicating that adding radiomics features could improve the predictive performance. Figure 4 shows the receiver operating characteristic (ROC) curves for the clinical, the radiomics, and the combined clinical-radiomics models on the training cohort.  We performed a decision curve analysis to assess the clinical value of the combined clinical-radiomics model. With threshold probability on the x-axis and net benefit on the y-axis, the decision curve analysis graph illustrates the trade-offs between true and false positives (describing benefit and harm) as the threshold probability changes (see Figure  5). We performed a decision curve analysis to assess the clinical value of the combined clinical-radiomics model. With threshold probability on the x-axis and net benefit on the y-axis, the decision curve analysis graph illustrates the trade-offs between true and false positives (describing benefit and harm) as the threshold probability changes (see Figure 5).
We performed a decision curve analysis to assess the clinical value of the combined clinical-radiomics model. With threshold probability on the x-axis and net benefit on the y-axis, the decision curve analysis graph illustrates the trade-offs between true and false positives (describing benefit and harm) as the threshold probability changes (see Figure  5). The x-axis represents the threshold probability, the y-axis the net benefit, and the blue line shows the combined prediction model. The green line represents the hypothesis that no patients had LN metastases, and the orange line that all patients had LN metastases. The threshold probability is where the treatment's expected benefit equals the benefit of avoiding treatment. If the possibility of LN metastasis is over the threshold probability, then a therapeutical strategy for LN metastases should be adopted. The DCA of the combined model shows that if the threshold possibility is between 0 and 0.13, then using the combined model to predict LNM adds more benefit than treating either or all patients. The x-axis represents the threshold probability, the y-axis the net benefit, and the blue line shows the combined prediction model. The green line represents the hypothesis that no patients had LN metastases, and the orange line that all patients had LN metastases. The threshold probability is where the treatment's expected benefit equals the benefit of avoiding treatment. If the possibility of LN metastasis is over the threshold probability, then a therapeutical strategy for LN metastases should be adopted. The DCA of the combined model shows that if the threshold possibility is between 0 and 0.13, then using the combined model to predict LNM adds more benefit than treating either or all patients.

Discussion
We developed a clinical-radiomics model for the individualised preoperative prediction of LNM in testicular germ cell tumour (TGCT) patients that consisted of clinical risk factors and radiomics features to identify the stage I (TGCT) patients who required adjuvant therapy and those who did not.
Our main findings can be summarised by the following: Using multivariable logistic regression analysis, we constructed a radiomics-only model, a clinical-only model, and a combined predictive model integrating clinical and radiomics features. The combined radiomics-clinical model showed the highest accuracy in predicting LNM (AUC = 0.89 ± 0.03; 95% CI); accuracy: 81%, precision 80%, recall 83%, and F1 score 81%.
Most TGCT patients initially present with stage I disease, and >95% of all stage I seminoma or non-seminoma patients are cured regardless of the therapeutical strategy [46][47][48], resulting in controversies regarding adjuvant chemotherapy, radiotherapy, or retroperitoneal lymph node dissection following orchiectomy due to short-and long-term side effects, such as secondary malignancies, cardiovascular disease, peripheral neuropathy, and loss of antegrade ejaculation [5][6][7]49].
The serum biomarkers AFP, β-hCG, and LDH are substantial instruments for diagnosing, prognostication, and monitoring testicular cancer, which is reflected in the International Germ Cell Cancer Consensus Group prognostic index [17,50,51]. However, sensitivity is limited; up to 40% of patients with recurrence have "normal" values [52].
Several studies have proposed further prognostic clinical risk factors, including age and BMI, but their roles have not yet been sufficiently clarified, with somewhat controversial discussion [40][41][42][43].
To date, neither imaging nor serum tumour markers have been proven to be suitable predictors of the presence of lymph node metastases [53,54]. However, the inherently excel-lent prognosis can be put at risk by suboptimal treatment, with over-and undertreatment being equally detrimental.
Several studies demonstrate the ability of radiomics based on MR-or CT-imaging to detect lymph node metastasis, including lung, oesophagal, breast, cervical, bladder, and colorectal cancer [28,29,[55][56][57][58]. Classification accuracy in these studies ranged from 76% to 84%, which is lower than the results of our study.
Until now, few studies have been performed to distinguish between benign and malignant LN in testicular cancer.
In their study, Baessler et al. showed that a machine-learning classifier based on (CT) radiomics could predict the histopathology of lymph nodes after LN dissection following chemotherapy in patients with metastatic non-seminomatous germ cell tumours of the testis [59]. This single-centre retrospective study included eighty patients with a total of 204 lesions classified by a support vector model and achieved 81% classification accuracy.
Nevertheless, in contrast to our study, they did not include clinical variables in their radiomics approach to further increase diagnostic performance.
Furthermore, they split the study cohort, which was altogether of moderate size, into three subgroups, with only 19 patients in the test group and with an overall reduction in statistical significance as a result. To address the moderate dataset, we used a crossvalidation approach, which involves repeated data splitting to prevent overfitting while obtaining accurate estimates of the model coefficients [60]. Lewin  Given our 10-fold cross-validation approach, the a priori inhomogeneity of our dataset, and the integration of clinical risk factors, we are convinced that our combined prediction model is more generalisable, and forthcoming investigations should further validate our trained model in prospective studies.
Beyond radiomics-based models, several clinical models exist to predict the occurrence of LNM in TGCT. However, these models yielded conflicting results and could not be included in today's clinical decision-making [53,[62][63][64].
Taken together, identifying and implementing novel biomarkers might be helpful for early diagnosis and monitoring of disease relapse.
Our study is the first to use a combined CT-based radiomics model integrating clinical predictors for the individualised preoperative prediction of LNM in early-stage TGCT to reduce overtreatment in this group of young patients.
However, we acknowledge some limitations in the present study. As a retrospective study with a modest cohort size, there may be inevitable selection bias. Furthermore, classes were highly unbalanced, in line with the normal distribution, with 80% of all stage 1 TGCT patients showing an excellent outcome. Nevertheless, unlike prior radiomics investigations on LN metastasis that mostly extracted features from the largest cross-sectional area, our study performed whole lesion analysis by considering all available CT slices, thus providing abundant information about tumour heterogeneity.
Second, our case was a single-institution study. Due to our patient population's high cure rate, it is challenging to power studies to examine prognostic and predictive factors adequately. However, prospective and multicenter validation is warranted to obtain higher-quality evidence for clinical use.
Moreover, only one (imaging) modality and the circulating tumour markers β-HCG and AFP were used in this study. Among other prognostic factors, such as lymphovascular or rete testis invasion, tumour size is the most valuable prognostic factor for early-stage seminoma relapse [65,66]. Our study included solely clinical and laboratory parameters that can be collected easily, quickly, and non-invasively so that a preoperative risk assessment of the individual patient can already be made at the time of CT.
In addition to the known serum markers, studies show the potential of non-coding RNAs as biomarkers with stem cell-associated microRNAs (miR-371a-3p and miR-302/367 clusters) outperforming the conventional tumour markers in detecting newly diagnosed TGCT patients [67,68].
If more modalities were combined as a multi-omics approach, the obtained feature pool might increase the ability to predict LNM in patients with testicular cancer.
Our presented CT-based radiomics-clinical model represents an exciting non-invasive prediction tool for individualised prediction of LN metastasis in testicular germ cell tumours to reduce overtreatment in this young group of patients. Multi-centre, retrospective validations and prospective randomised clinical trials should be undertaken to gain highquality evidence for clinical applications in subsequent studies.

Conclusions
In conclusion, our combined clinical-radiomics model applied on preoperative CT imaging represents an exciting new tool for improved prediction of lymph node metastases in early-stage testicular germ cell tumour (TGCT) patients to reduce overtreatment in this group of young patients. The presented approach should be combined with novel clinical biomarkers, such as microRNAs (miR-371a-3p and miR-302/367 cluster) and further validated in larger, prospective clinical trials. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.