Prognostic Value and Quantitative CT Analysis in RANKL Expression of Spinal GCTB in the Denosumab Era: A Machine Learning Approach

Simple Summary Prognostic assessment of giant cell tumor of bone (GCTB) is an ongoing challenge in the treatment and management of bone tumors. Recurrence rates of spinal GCTB are higher compared to GCTB in other bone sites, presumably due to a more aggressive pathology and/or the conservative surgery performed to spare the spinal cord nerve function and decrease postoperative complications. A more accurate prognosis of GCTB will help to inform the choice of treatment methods. This retrospective study investigated prognosis-related molecular markers in spinal GCTB, including RANKL (target of denosumab), focusing on using machine learning analysis based on pre-operative CT to evaluate RANKL status, which may facilitate the selection of better disease management strategies. Abstract The receptor activator of the nuclear factor kappa B ligand (RANKL) is the therapeutic target of denosumab. In this study, we evaluated whether radiomics signature and machine learning analysis can predict RANKL status in spinal giant cell tumors of bone (GCTB). This retrospective study consisted of 107 patients, including a training set (n = 82) and a validation set (n = 25). Kaplan-Meier survival analysis was used to validate the prognostic value of RANKL status. Radiomic feature extraction of three heterogeneous regions (VOIentire, VOIedge, and VOIcore) from pretreatment CT were performed. Followed by feature selection using Selected K Best and least absolute shrinkage and selection operator (LASSO) analysis, three classifiers (random forest (RF), support vector machine, and logistic regression) were used to build models. The area under the curve (AUC), accuracy, F1 score, recall, precision, sensitivity, and specificity were used to evaluate the models’ performance. Classification of 75 patients with eligible follow-up based on RANKL status resulted in a significant difference in progression-free survival (p = 0.035). VOIcore-based RF classifier performs best. Using this model, the AUCs for the training and validation cohorts were 0.880 and 0.766, respectively. In conclusion, a machine learning approach based on CT radiomic features could discriminate prognostically significant RANKL status in spinal GCTB, which may ultimately aid clinical decision-making.


Introduction
Over the last few years, our knowledge of the receptor activator of nuclear factor kappa-B (RANK)/RANK ligand (RANKL) pathway has expanded, and targeting RANKL is being evaluated as anticancer therapy [1,2]. Overexpression of RANKL increases the risk of metastasis in giant cell tumors of bone [3]. However, the effects of RANKL levels on the long-term survival of patients with spinal GCTB have not yet been elucidated. This study was approved by the Institutional Research Ethics Board, and adheres to the tenets prescribed by the Declaration of Helsinki (institutional review board "M2019460", National Clinical Trial number "NCT04952818"). Signed informed consent was waived. We identified 135 consecutive patients with spinal GCTB who underwent surgery in our institution from January 2009 to February 2022. Inclusion criteria were as follows: (a) patients who had a histopathological diagnosis of spinal GCTB; (b) baseline CT performed before surgery, and (c) patients treated by surgical resection.
Among these 135 patients, 28 patients were excluded from this study for the following reasons: (a) The CT scan was performed more than 2 weeks before surgery; (b) Patient had a previous history of biopsy or treatment for the tumor lesion; (c) poor image quality, e.g., with peri-lesion artifacts; (d) missed postoperative paraffin specimen for reanalysis; (e) cases with negative H3F3A (H3 histone, family 3A); and (f) incomplete basic clinical data.
A total of 107 patients (mean age 32.94 ± 12.99 years; M:F = 45:62; age range: 18-71) were included in the final study cohort (Figure 1). Taking September 2019 as the cut-off point, they were divided into a training cohort (n = 82) and a validation cohort (n = 25).
A total of 107 patients (mean age 32.94 ± 12.99 years; M: F = 45: 62; age range: 18-71) were included in the final study cohort (Figure 1). Taking September 2019 as the cut-off point, they were divided into a training cohort (n = 82) and a validation cohort (n = 25).

Immunohistochemistry (IHC) Analysis
Samples from all cases were stained with hematoxylin-eosin (HE) and H3F3A (RevMAb Biosciences, South San Francisco, CA, USA, Catalog No.31-1145-00). IHC staining was performed using an anti-RANKL antibody from Abcam (No. ab222215). From each section, the degree of tumor positivity and the percentage of positive cells were evaluated by two independent pathologists who were blinded to clinical and pathological information. Similar to previous studies [3], RANKL staining was scored based on intensity (0 = no expression; 1 = weak to moderate expression; 2 = strong expression) and percentage of positive cells (0 ≤ 10%; 1 = 10-25%; 2 = 26-50%; 3 ≥ 50%). Cutoff levels for the sum of scores were defined as 0-3 for low expression, and 4-5 for high expression.

Postoperative Follow-Up and Clinical Data
Patients were followed up according to the following guidelines: once every 3 months during the first 2 years; once every 6 months from 3-5 years; and annually after 5 years. Local disease recurrence was confirmed by MRI with an emerging mass found at the resection site with or without a tissue biopsy.
All the clinical data including age, gender, location, tumor stage (Enneking stage), radiotherapy, and denosumab treatment were obtained from the medical records. Simultaneously, imaging assessment of tumor size includes physician measurement (Y) of the longest diameter, while tumor volume is automatically measured by a manually constructed 3D tumor mask.
Based on previous studies, surgery and preoperative radiotherapy may have an impact on postoperative recurrence, so we included 75 patients who underwent total en bloc

Immunohistochemistry (IHC) Analysis
Samples from all cases were stained with hematoxylin-eosin (HE) and H3F3A (RevMAb Biosciences, South San Francisco, CA, USA, Catalog No.31-1145-00). IHC staining was performed using an anti-RANKL antibody from Abcam (No. ab222215). From each section, the degree of tumor positivity and the percentage of positive cells were evaluated by two independent pathologists who were blinded to clinical and pathological information. Similar to previous studies [3], RANKL staining was scored based on intensity (0 = no expression; 1 = weak to moderate expression; 2 = strong expression) and percentage of positive cells (0 ≤ 10%; 1 = 10-25%; 2 = 26-50%; 3 ≥ 50%). Cutoff levels for the sum of scores were defined as 0-3 for low expression, and 4-5 for high expression.

Postoperative Follow-Up and Clinical Data
Patients were followed up according to the following guidelines: once every 3 months during the first 2 years; once every 6 months from 3-5 years; and annually after 5 years. Local disease recurrence was confirmed by MRI with an emerging mass found at the resection site with or without a tissue biopsy.
All the clinical data including age, gender, location, tumor stage (Enneking stage), radiotherapy, and denosumab treatment were obtained from the medical records. Simultaneously, imaging assessment of tumor size includes physician measurement (Y) of the longest diameter, while tumor volume is automatically measured by a manually constructed 3D tumor mask.
Based on previous studies, surgery and preoperative radiotherapy may have an impact on postoperative recurrence, so we included 75 patients who underwent total en bloc spondylectomy (TES) without preoperative radiotherapy for survival analysis to evaluate the prognostic significance of RANKL levels. Two cases were shown in Figures 2 and 3.

Computed Tomography Imaging and Multiregional Labeling
CT scans were performed using a Sensation-64 scanner (SOMATOM Definition; Siemens, Erlangen, Germany) and a 64-slice spiral CT scanner (Light speed; GE Medical System, London, UK), with parameters including 120 kVp and automatic mAs. The collimator width was 0.625 and 0.60 mm, and the pitch was 1.0. All the CT images were retrieved from the picture archiving and communication system (PACS) in Digital Imaging and Communications in Medicine (DICOM) format. CT scans are isovolumetric and all images can be reconstructed for analysis.
Three different heterogeneous regions (VOI entire , VOI edge , and VOI core ) were segmented and labeled. Manual segmentation framework construction was conducted using Research Portal V1.1 (United Imaging Intelligence, Co., Ltd. Shanghai, China). The VOI entire was segmented on axial CT images by a radiologist with 5 years of experience in spinal tumor diagnoses. Moreover, 35 patients were randomly selected and segmented by another musculoskeletal radiologist with 13 years of experience to construct a test-retest set and calculate the interclass correlation coefficients (ICCs) of radiomic features.

Computed Tomography Imaging and Multiregional Labeling
CT scans were performed using a Sensation-64 scanner (SOMATOM Definition; Siemens, Erlangen, Germany) and a 64-slice spiral CT scanner (Light speed; GE Medical System, London, UK), with parameters including 120 kVp and automatic mAs. The collimator width was 0.625 and 0.60 mm, and the pitch was 1.0. All the CT images were retrieved from the picture archiving and communication system (PACS) in Digital Imaging and Communications in Medicine (DICOM) format. CT scans are isovolumetric and all images can be reconstructed for analysis.
Three different heterogeneous regions (VOI entire , VOI edge , and VOI core ) were segmented and labeled. Manual segmentation framework construction was conducted using Research Portal V1.1 (United Imaging Intelligence, Co., Ltd. Shanghai, China). The VOI entire was segmented on axial CT images by a radiologist with 5 years of experience in spinal tumor diagnoses. Moreover, 35 patients were randomly selected and segmented by another musculoskeletal radiologist with 13 years of experience to construct a test-retest set and calculate the interclass correlation coefficients (ICCs) of radiomic features.

Computed Tomography Imaging and Multiregional Labeling
CT scans were performed using a Sensation-64 scanner (SOMATOM Definition; Siemens, Erlangen, Germany) and a 64-slice spiral CT scanner (Light speed; GE Medical System, London, UK), with parameters including 120 kVp and automatic mAs. The collimator width was 0.625 and 0.60 mm, and the pitch was 1.0. All the CT images were retrieved from the picture archiving and communication system (PACS) in Digital Imaging and Communications in Medicine (DICOM) format. CT scans are isovolumetric and all images can be reconstructed for analysis.
Three different heterogeneous regions (VOI entire , VOI edge , and VOI core ) were segmented and labeled. Manual segmentation framework construction was conducted using Research Portal V1.1 (United Imaging Intelligence, Co., Ltd., Shanghai, China). The VOI entire was segmented on axial CT images by a radiologist with 5 years of experience in spinal tumor diagnoses. Moreover, 35 patients were randomly selected and segmented by another musculoskeletal radiologist with 13 years of experience to construct a test-retest set and calculate the interclass correlation coefficients (ICCs) of radiomic features.
Next, VOI entire was shrunk by 2 mm or 3 mm to generate the other two VOIs, the marginal area represents the transition zone of the bone tumor (VOI edge ), and the remaining core area represents the more central area of the bone tumor (VOI core ). In detail, VOI shrinkage was performed using the erosion function of the Research Portal V1.1. Figure 4 presents the schema for segmentation, radiomic feature extraction, and predictive modeling. It should be noted that we evaluated the selection of the width of the edge band (VOI edge ), and the detailed research process is shown in Supplementary Figure S1.
Next, VOI entire was shrunk by 2 mm or 3 mm to generate the other two VOIs, the marginal area represents the transition zone of the bone tumor (VOI edge ), and the remaining core area represents the more central area of the bone tumor (VOI core ). In detail, VOI shrinkage was performed using the erosion function of the Research Portal V1.1. Figure 4 presents the schema for segmentation, radiomic feature extraction, and predictive modeling. It should be noted that we evaluated the selection of the width of the edge band (VOI edge ), and the detailed research process is shown in Supplementary Figure S1.

Radiomics Feature Extraction and Preprocessing
Radiomics feature selection, feature extraction, and machine learning models were established on the Research Portal V1.1 (United Imaging Intelligence, Co., Ltd.). Prior to radiomic feature extraction, B-Spline interpolation resampling was used to normalize voxel size, and anisotropic voxels were resampled to generate isotropic voxels of 1.0 × 1.0 × 1.0 (mm). A total of 2264 radiomics features of each tumor lesion were extracted from preoperative CT imaging, including first-order features, shape features, texture features, and high-level features. The shape features were extracted according to the VOIs in the original image, and the remaining features were extracted in the original image and the filtered image.
Intraclass correlation coefficients (ICCs) were determined to assess the reproducibility of the radiomics features extracted from the VOIs drawn by three independent radiologists. Features with an ICC ≥ 0.8 were considered to be reliable. Before feature selection, all features were normalized by replacing outliers with the median of the particular variance vector and standardizing the data using the Z-score standardization.

Feature Selection and Machine Learning Analysis Strategy
Radiomics features were extracted from different VOIs (multiregions), and then Select K Best analysis and least absolute shrinkage and selection operator (LASSO) methods were used to analyze and screen the extracted image features related to RANKL expression. LASSO regularization involved a parameter, λ, to control the number of selected features; wherein a larger λ retains more features, and the final feature number was de-

Radiomics Feature Extraction and Preprocessing
Radiomics feature selection, feature extraction, and machine learning models were established on the Research Portal V1.1 (United Imaging Intelligence, Co., Ltd., Shanghai, China). Prior to radiomic feature extraction, B-Spline interpolation resampling was used to normalize voxel size, and anisotropic voxels were resampled to generate isotropic voxels of 1.0 × 1.0 × 1.0 (mm). A total of 2264 radiomics features of each tumor lesion were extracted from preoperative CT imaging, including first-order features, shape features, texture features, and high-level features. The shape features were extracted according to the VOIs in the original image, and the remaining features were extracted in the original image and the filtered image.
Intraclass correlation coefficients (ICCs) were determined to assess the reproducibility of the radiomics features extracted from the VOIs drawn by three independent radiologists. Features with an ICC ≥ 0.8 were considered to be reliable. Before feature selection, all features were normalized by replacing outliers with the median of the particular variance vector and standardizing the data using the Z-score standardization.

Feature Selection and Machine Learning Analysis Strategy
Radiomics features were extracted from different VOIs (multiregions), and then Select K Best analysis and least absolute shrinkage and selection operator (LASSO) methods were used to analyze and screen the extracted image features related to RANKL expression. LASSO regularization involved a parameter, λ, to control the number of selected features; wherein a larger λ retains more features, and the final feature number was determined by λ to maximize the C-index in the training set. A multiple-feature-based radiomics signature (radiomics score, rad-score), was calculated for each patient using a linear combination of features that were each weighted by their respective coefficients.
Three classifiers were used in the prediction models: random forest (RF), support vector machine (SVM), and logistic regression (LR). According to events per independent variable (EPV) values in the multivariable prediction models [15], we selected the 3 features (10 EPV) with the highest efficiency to construct models from the three different heterogeneous regions (VOI entire , VOI edge , and VOI core ). At the same time, we compared the model performance of different edge bandwidths of 2 mm and 3 mm. Finally, 15 models were constructed and compared.

Statistical Analysis
Regarding continuous variables, data are expressed as mean ± SD or median interquartile range (IQR). For categorical variables, data are expressed as counts and percentages (n, %). The Wilcoxon signed-rank or Kruskal-Wallis tests were used to compare numerical variables, and Fisher's exact test was used to compare categorical variables. PFS probabilities were estimated using the Kaplan-Meier method and Cox proportional-hazards regression. Statistical analyses were conducted using SPSS (version 24.0, Chicago, IL, USA) and MedCalc (version 15.0, Mariakerke, Belgium). To assess whether the radiomic signature score could separate patients into low or high RANKL expression groups, the area under the curve (AUC) of the receiver operator characteristic (ROC) and its confidence interval (CI) were determined in accordance with the DeLong method. A result was considered statistically significant with a two-tailed p-value of p < 0.05.

Patient Information
Of the 107 patients included in the final study cohort (Table 1), sixty-six (62%) spinal GCTB patients exhibited high levels of RANKL expression. The results and grading of immunohistochemical staining are shown in Figure 5A. In the follow-up cohort, RANKL levels were used to indicate a statistically significant difference in PFS, based on Kaplan-Meier survival analysis ( Figure 5B). There was no difference in postoperative survival status based on patient age (p = 0.190) or gender (p = 0.160) (Supplementary Figure S2). No correlation was observed between patient age or gender and RANKL expression (p > 0.05), and these factors were not included in the machine learning analysis.

Feature Robustness for Multiregional VOIs
Quantification of all extracted features from each VOI is shown in Supplementary Table S1. The features extracted from the three different VOIs have different numbers of remaining valid features after being screened by the ICC calculation (threshold = 0.8). In the entire tumor vs. margin shrinkage segmentation, the stable feature rates were 51.06% (n = 1156), 43.95% (n = 995) and 54.37% (n = 1231) for VOI entire , VOI edge , and VOI core , respectively. The number of stable features derived from 3D margin shrinkage segmentation was higher compared to the entire tumor and marginal zone (p = 0.038). The other groups were not significantly different, with p-values of p = 0.101 and p = 0.200, respectively. Table S1 details the number and percentage of stable features that were obtained with different 3D segmentation methods, grouped according to feature class. In the training cohort, with 10 repetitions of 10-fold cross-validation by the LASSO method, the top 3 features with the most significant coefficients were selected per repetition. Therefore, 300 features with different repetitions were extracted for each VOI. Features and repetitions are shown in Supplementary Figure S3 (edge band width 3 mm). The first three stable and effective features extracted based on each regional VOI were selected for the model construction.

Performance of Models Based on Different Classifiers
We compared the performance of models based on different VOIs and classifiers ( Figure 6). For the radiomics models, the VOI core model was the best performer; the AUC values of the three models are visually displayed in a heatmap diagram (Figure 7). For each VOI model, the performance of the RF classifier was better than that of SVM and LR, and the statistical indicators of DeLong's test are given in Supplementary Table S2. When the width of the edge band is 3 mm, the model performance of both the core region VOI and the edge region VOI is improved. The corresponding performance parameters of each model (edge width 3 mm), including AUC, F1 score, recall, precision, sensitivity, specificity, and accuracy, are given in Table 2.

Feature Robustness for Multiregional VOIs
Quantification of all extracted features from each VOI is shown in Supplementary Table S1. The features extracted from the three different VOIs have different numbers of remaining valid features after being screened by the ICC calculation (threshold = 0.8).
In the entire tumor vs. margin shrinkage segmentation, the stable feature rates were 51.06% (n = 1156), 43.95% (n = 995) and 54.37% (n = 1231) for VOI entire , VOI edge , and VOI core , respectively. The number of stable features derived from 3D margin shrinkage segmentation was higher compared to the entire tumor and marginal zone (p = 0.038). The other groups were not significantly different, with p-values of p = 0.101 and p = 0.200, respectively. Table S1 details the number and percentage of stable features that were obtained with different 3D segmentation methods, grouped according to feature class. In the training cohort, with 10 repetitions of 10-fold cross-validation by the LASSO method, the top 3 features with the most significant coefficients were selected per repetition. Therefore, 300 features with different repetitions were extracted for each VOI. Features and repetitions are shown in Supplementary Figure S3 (edge band width 3 mm). The first three stable and effective features extracted based on each regional VOI were selected for the model construction.

Performance of Models Based on Different Classifiers
We compared the performance of models based on different VOIs and classifiers ( Figure 6). For the radiomics models, the VOI core model was the best performer; the AUC values of the three models are visually displayed in a heatmap diagram (Figure 7). For each VOI model, the performance of the RF classifier was better than that of SVM and LR, and the statistical indicators of DeLong's test are given in Supplementary Table S2. When the width of the edge band is 3 mm, the model performance of both the core region VOI and the edge region VOI is improved. The corresponding performance parameters of each model (edge width 3 mm), including AUC, F1 score, recall, precision, sensitivity, specificity, and accuracy, are given in Table 2.

Performance and Validation of the Final Prediction Models
The RF model based on VOIs in the core region with 3 mm edge shrink was selected for independent validation. The features of different VOIs used to build the models are shown in Supplementary Table S3. The ROC curve and confusion matrix of the predicted results are shown in Figure 8. The columns represent the number of predicted classes, and the rows represent the number of true attribution classes of the data. The accuracy, precision, sensitivity, and specificity of the model are 0.640, 0.647, 0.785, and 0.454 respectively.

Performance and Validation of the Final Prediction Models
The RF model based on VOIs in the core region with 3 mm edge shrink was selected for independent validation. The features of different VOIs used to build the models are shown in Supplementary Table S3. The ROC curve and confusion matrix of the predicted results are shown in Figure 8. The columns represent the number of predicted classes, and the rows represent the number of true attribution classes of the data. The accuracy, precision, sensitivity, and specificity of the model are 0.640, 0.647, 0.785, and 0.454 respectively.

Discussion
In this study, we conducted a survival analysis to validate the prognostic value of the RANKL expression levels in a cohort of patients with a postoperative follow-up of spinal GCTB. At the same time, we applied machine learning methods based on radiomics signatures extracted from preoperative CT imaging to classify RANKL status in patients with spinal GCTB.
Treatment of GCTB cases located in the spine is challenging, as en bloc or wide resection is technically difficult and recurrence is common [6]. Improving the prognosis of GCTB of the spine is the focus of surgeons today. Considering the risks faced by patients undergoing the procedure, accurate preoperative stratification of patient prognosis can help in treatment plan planning and precise treatment. Using bioinformatics and combinatorial screening approaches to determine biomarker expression status could be useful in identifying patients who may benefit most from treatment. The RANK/RANKL pathway is often overexpressed and has been positively correlated with tumor progression and advanced disease in primary malignant tumors of the bone, including osteosarcoma,

Discussion
In this study, we conducted a survival analysis to validate the prognostic value of the RANKL expression levels in a cohort of patients with a postoperative follow-up of spinal GCTB. At the same time, we applied machine learning methods based on radiomics signatures extracted from preoperative CT imaging to classify RANKL status in patients with spinal GCTB.
Treatment of GCTB cases located in the spine is challenging, as en bloc or wide resection is technically difficult and recurrence is common [6]. Improving the prognosis of GCTB of the spine is the focus of surgeons today. Considering the risks faced by patients undergoing the procedure, accurate preoperative stratification of patient prognosis can help in treatment plan planning and precise treatment. Using bioinformatics and combinatorial screening approaches to determine biomarker expression status could be useful in identifying patients who may benefit most from treatment. The RANK/RANKL pathway is often overexpressed and has been positively correlated with tumor progression and advanced disease in primary malignant tumors of the bone, including osteosarcoma, multiple myeloma, and GCTB [16][17][18][19][20]. Moreover, RANK and RANKL expressions are often higher in malignant histological subtypes of bone cancer. For example, RANKL expression is often elevated in Stage III GCTB, and is a useful prognostic marker for predicting the risk of local disease recurrence [21]. Additionally, elevated RANK and RANKL may significantly increase the risk of metastasis [3]. In this study, we reveal for the first time that RANKL expression status is significantly correlated with disease prognosis in patients with spinal GCTB in a clinical cohort. This highlights that drugs targeting the RANK/RANKL pathway may effectively improve patient outcomes beyond merely inhibiting bone tissue destruction. Patients with recurrence with high RANKL expression, as in Figure 2, may be able to improve their prognosis if they are evaluated correctly early preoperatively, are able to suggest drug availability, surgeons expand resection, and postoperative radiation therapy.
Imaging is important in evaluating cancers for surgical planning, prognosis prediction, and post-treatment assessment. Advances in quantitative image analysis methods may offer a more comprehensive approach that includes spatiotemporal information, to produce imagedriven biomarkers that may provide a deeper understanding of cancer biology, ultimately aiding in better clinical decisions [22,23]. Campanacci et al. applied conventional lesion size assessment and the new degree of CT ossification in a comparative analysis of 36 patients before and after denosumab treatment. In our cohort, some of the patients treated with denosumab (n = 15) partially undergo CT and/or MR scans. It is important to note that despite FDA approval in 2013, denosumab (Xgeva) has been available in our study countries since 2020. Among patients who had CT scans before and after treatment (n = 10), only 2 patients had local recurrence, implying that it may be difficult to statistically validate the predictive effect of the new CT evaluation proposed in this study in our cohort. Recurrence is a clinical observation, but it is worth mentioning that this article suggests that tumor treatment response is associated with postoperative recurrence, whereas in terms of microscopic expression, tumor treatment response is associated with therapeutic target status.
Currently, DNA sequencing and immunohistochemistry are the most accurate methods for molecular biomarker assessment in tumor. However, these methods have some limitations such as invasiveness, sampling error, and complications. Tumor multi-omics can be combined with predictive machine learning models, which could be the new digital method on the road to precision cancer medicine. In previous studies of biomarkers for tumor types such as lung cancer, glioma, breast cancer, and prostate cancer, radiomics is found to have the potential as a means to non-invasively predict the status of tumor biomarkers [24][25][26][27][28][29][30][31]. Radiomics approaches combined with a noninvasive machine learning model with tumor immunohistochemistry could improve treatment selection. Our study developed a radiomic signature of RANKL expression status in spinal GCTB based on CT imaging. The optimal model had an AUC of 0.88 in the training cohort, 0.745 in the validation cohort, and 0.766 in the test set. This radiomic signature could assist surgeons in evaluating patient prognosis before surgery and predicting drug treatment response, which would be helpful for clinical decision-making. Previous findings have also applied radiomics for preoperative differential diagnosis, recurrence prediction, and biomarker assessment of GCTB [32][33][34]. Although our study is preliminary, this may be a promising method for improving the precise treatment of spinal GCTB.
This study used three classifiers to construct models based on different VOIs to predict RANKL status in spinal GCTB patients. A total of 15 models were built and compared by ROC analysis. According to the AUCs, the RF classifier was better than the other two classifiers. Previous studies demonstrated the ideal classifiers for the construction of models may differ across organs, and RF may also inform prognostic models for other malignancies, including hepatocellular carcinoma [35], stomach cancer [36][37][38], rectal cancer [39,40], glioma [41], lung cancer [42], pelvic bone tumor [43], and soft tissue tumors [44]. The RF classifier also showed the best performance in recognizing the RANKL status of patients in this study, regardless of the type of VOI.
In radiomics studies, too many features can be selected that do not match the sample size in modeling, which may lead to overfitting. In our study, we follow the 10 EPV rule in binary classification problems [45][46][47]. According to the number of the smallest group in our study cohort (n = 30), three features were chosen for the construction of the different models. At the same time, to ensure the robustness of the analysis, we selected the top three most important (in terms of efficacy and frequency) radiomics features with 100 repetitions. This may be a way to improve the repeatability and reproducibility of radiomics. We also summarized the features extracted from multiregional VOIs to compare feature stability. The number of stable features derived from VOI core was higher compared to VOI entire , and the features from the VOI core are more concentrated. The RF model based on the tumor core region (VOI core ) achieved the highest AUC. The study clarified that RF classifiers with the core region of the tumor might perform better in CT image-based GCTB-related tasks.
Some limitations of our study are worth noting. First, this was a single-institution study with a limited sample size, a common concern in "data-hungry" AI-based research. Although our institution is a regionally renowned hospital specializing in spinal tumor surgery, it is quite difficult to expand the sample size due to the incidence of GCTB in the spine. In the future, multicenter studies may improve the robustness and general feasibility of radiomics results. Second, our study did not discuss the effects of surgical procedures on postoperative survival, but only performed survival analysis in patients after TES. TES is the mainstream surgical method for spinal tumors at present, and striving for total vertebral resection is the main consideration of the surgeon unless it cannot be achieved. Third, enhanced CT and MRI images were not analyzed in the retrospective design. In our cohort, preoperative CT without contrast was the most commonly performed (n = 107), while relatively fewer patients underwent enhanced CT (n = 43) or MRI (n = 65). Machine learning attempts based on other scanning modalities may be promising in the future. Finally, due to the limited scope of this study, the generalizability of our findings needs to be confirmed on additional datasets.

Conclusions
RANKL expression status may be an important molecular marker for the postoperative survival of spinal GCTB patients. We present an RF-based machine learning method that exhibited excellent AUC and stability in assessing RANKL status in patients with spinal GCTB. Radiomics features based on 3D margin shrinkage segmentation showed good performance in the preoperative evaluation of low/high RANKL expression.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14215201/s1, Figure S1: Two patients with different lesion sizes; Figure S2: Kaplan-Meier survival analysis for gender and age; Figure S3: Feature names extracted from different regions and the number of repetitions. Table S1: The total extracted features of each VOI, the number and percentage of features after ICC inspection of VOIs in different regions; Table S2: Delong test results between different models. Table S3: Features selected by the multi-region VOI models.