Predicting Peri-Operative Outcomes in Patients Treated with Percutaneous Thermal Ablation for Small Renal Masses: The SuNS Nephrometry Score

Our objective was to develop a new, simple, and ablation-specific nephrometry score to predict peri-operative outcomes and to compare its predictive accuracy to PADUA and RENAL scores. Overall, 418 patients were treated with percutaneous thermal ablation (microwave and radiofrequency) between 2008 and 2021. The outcome of interest was trifecta status (achieved vs. not achieved): incomplete ablation or Clavien–Dindo ≥ 3 complications or postoperative estimated glomerular filtration rate decrease ≥ 30%. First, we validated the discrimination ability of the PADUA and RENAL scoring systems. Second, we created and internally validated a novel scoring (SuNS) system, according to multivariable logistic regression models. The predictive accuracy of the model was tested in terms of discrimination and calibration. Overall, 89 (21%) patients did not achieve trifecta. PADUA and RENAL scores showed poor ability to predict trifecta status (c-indexes 0.60 [0.53–0.67] and 0.62 [0.55–0.69], respectively). We, therefore, developed the SuNS model (c-index: 0.74 [0.67–0.79]) based on: (1) contact surface area; (2) nearness to renal sinus or urinary collecting system; (3) tumour diameter. Three complexity classes were created: low (3–4 points; 11% of no trifecta) vs. moderate (5–6 points; 30% of no trifecta) vs. high (7–8 points; 65% of no trifecta) complexity. Limitations include the retrospective and single-institution nature of the study. In conclusion, we developed an immediate, simple, and reproducible ablation-specific nephrometry score (SuNS) that outperformed PADUA and RENAL nephrometry scores in predicting peri-operative outcomes. External validation is required before daily practice implementation.


Introduction
The incidence of kidney cancer is increasing worldwide [1].Specifically, incidental detection during cross-sectional imaging exams determined higher diagnoses of small renal masses (SRMs) in the last decades [2].Radical or partial nephrectomy (PN) still represent the first-choice treatment [3], while image-guided percutaneous thermal ablation (PTA) has been proposed as an effective minimally invasive alternative, with reported good oncologic results and low complications rates [4][5][6].
These scores are mostly based on easily collectable, preoperative tumour characteristics, not including patients' clinical parameters, to assess the complexity of the procedure along with short-and mid-term outcome predictions.Moreover, NSs represent the tools to standardize tumour characteristics in clinical practice and research.To date, the PADUA [7] and RENAL [8] NSs, developed on surgically treated cohorts, are the most widely used and established methods to assess the complexity of SRM.However, their application to PTA produced conflicting results.An association between NS and complications [11][12][13], estimated glomerular filtration rate (eGFR [14]) and oncologic outcomes [15][16][17] has been reported by some studies, but not confirmed by others [18][19][20].A few ablation-specific NSs have been developed but none of them are widely accepted in clinical practice [15,18,19,21,22].
We hypothesized that a new, simple, ablation-specific score (SuNS) might better predict peri-operative outcomes, compared to PADUA and RENAL NSs.To address this hypothesis, we relied on a large (n = 418) contemporary cohort of patients treated at a single institution with PTA for SRMs.

Patient Selection
This retrospective single-institution data analysis was approved by the Institutional Review Board of the European Institute of Oncology.
Overall, 418 patients with SRMs (T1a-b kidney cancer) were treated with PTA between 2008 and 2021 in a tertiary referral center.We focused on patients aged ≥18 years who had undergone either radiofrequency (RF) or microwave (MW) ablation.Age, sex, age-adjusted Charlson comorbidity index (CCI), body mass index, SRM side, location, diameter, T stage, contact surface area (CSA [23][24][25]), nearness to renal sinus, urinary collecting system, ureter, bowel, skin-to-tumour distance, and histology were recorded.Analyses did not include residual or recurrent SRM.

Thermal Ablation Procedure
PTA technique has been previously described [26].All procedures were performed under general anesthesia, in a dedicated operating room equipped with both ultrasound (US) and computed tomography (CT) scan, with the patient lying in the most favourable position for a direct needle approach.All procedures were performed by a team of two interventional radiologists, with at least one with more than ten years of experience.Adjunct procedures such as hydrodissection or pyeloperfusion were used when appropriate upon team clinical judgment [5].Procedures were guided by US and CT or by US-CT fusion imaging [27].Ablation was performed with an RF 3000 system (Boston Scientific, Natick, Massachusetts, MA, USA) and a retrievable hook-umbrella needle or with an MW system (Emiprint, Medtronic, Minneapolis, MN, USA).In cases of suspicious persistence of pathological tissue, device repositioning and retreatment were performed in the same manner as previously described.A contrast-enhanced CT (CECT) scan was performed to evaluate treatment results, ablation completeness and possible complications at 24 h from PTA. Blood tests, including complete blood count, creatinine and eGFR, were performed 24 h and 48 h after PTA.Patients were then followed according to standard protocols with a CT and a clinical visit at 6 weeks, 3, 6, 12, 18 and 24 months and yearly thereafter.

Variables of Interest
CSA was originally described by Leslie et al. in a PN cohort [23].It can be easily computed using the formula 2πrd (r: radius, d: depth of invasion 24).We modified the CSA cut-off (previously defined as <20 vs. ≥20 cm 2 ) to <10 vs. ≥10 cm 2 accounting for a smaller median size of SRMs in a PTA cohort.RENAL and PADUA NS and derived complexity classes were assigned to all patients as previously described [7,8].The outcome of interest for both validation and development was trifecta status (achieved vs. not achieved).Trifecta not achieved outcome was defined as follows: partial SRM ablation detected with CECT after 24 h from procedure OR postoperative complication defined as Clavien-Dindo ≥ 3 [28] OR postoperative eGFR decrease ≥ 30% [29].

Statistical Analysis
First, we compared the demographic, radiologic, pathologic and outcome differences, stratifying the cohort according to trifecta status.Second, multivariable logistic regression models predicting trifecta status according to PADUA and RENAL scoring systems [7,8] were fitted and the discrimination ability of both scores was estimated using Harrell's cindex.Third, separate univariable logistic regression models tested the association between available SRM features and trifecta status.Fourth, statistically significant predictors of trifecta status were used to fit a multivariable logistic regression model (SuNS).Fifth, the predictive accuracy of the model was evaluated with respect to discrimination (i.e., the ability of the model to classify a patient achieving from one not achieving trifecta) using Harrell's c-index and calibration (i.e., the agreement between the outcome frequencies observed in the data and the predicted probabilities of the model) depicted with the loess plot.Additionally, 2000 bootstrap resamples were performed to PADUA, RENAL and SuNS c-indexes.Finally, a sensitivity analysis was conducted to evaluate discrepancies in the SuNS model ability to predict trifecta status according to the ablation technique.All statistical tests were two-sided, with the level of significance set at p < 0.05 and were performed with R Software Environment for Statistical Computing and Graphics (R version 4.1.3,R Foundation for Statical Computing, Vienna, Austria).

External Validation of PADUA and RENAL Nephrometry Scores
In the PADUA multivariable logistic regression model predicting trifecta status, only a diameter > 4 cm (odds ratio [OR]: 3.4) and the dislocation/infiltration of the urinary collecting system (OR: 2.3) reached independent predictor status (Table 2).A bootstrapped c-index of 0.60 (0.53-0.67) determined the poor ability of the PADUA score to discriminate trifecta status.
In the RENAL multivariable logistic regression model predicting trifecta status, only a diameter > 4 cm (OR: 3.7) and the nearness to renal sinus/urinary collecting system ≤ 4 mm (OR: 6.1) reached independent predictor status.A bootstrapped c-index of 0.62 (0.55-0.69) determined the poor ability of the RENAL score to discriminate trifecta status.
In the multivariable logistic regression model, all the predictors maintained statistical significance (Table 3).A bootstrapped c-index of 0.74 (0.67-0.79) determines the fair ability of the model to discriminate trifecta status and the calibration plot depicts a good agreement between the observed and predicted outcomes (Figure 2).Bold values indicate statistical significance p < 0.05.SuNS: surface, nearness to renal sinus or collecting system, size; CI: confidence interval.
A bootstrapped c-index of 0.74 (0.67-0.79) determines the fair ability of the model to discriminate trifecta status and the calibration plot depicts a good agreement between the observed and predicted outcomes (Figure 2).We defined three complexity classes based on logistic regression ORs associated with each score point (Supplementary Table S2).Moderate complexity (5-6 points) and high complexity (7-8 points) classes showed ORs of 3.3 and 14.3, respectively, as compared to the low complexity class (3-4 points).
Patient distribution, distribution of trifecta achievement and determining factors according to the SuNS complexity classes are depicted in Figure 3 and Supplementary Table S3.Patient distribution, distribution of trifecta achievement and determining factors according to the SuNS complexity classes are depicted in Figure 3 and Supplementary Table S3.Specifically, the rates of no trifecta achievement were the following: 11 vs. 30 vs. 65% in low (n = 256; 61%) vs. moderate (n = 128; 31%) vs. high (n = 34; 8%) complexity, respectively.Supplementary Table S4 summarizes the percentage of patients, previously scored according to PADUA and RENAL, that were reclassified according to the SuNS score.Specifically, the rates of no trifecta achievement were the following: 11 vs. 30 vs. 65% in low (n = 256; 61%) vs. moderate (n = 128; 31%) vs. high (n = 34; 8%) complexity, respectively.Supplementary Table S4 summarizes the percentage of patients, previously scored according to PADUA and RENAL, that were reclassified according to the SuNS score.

Discussion
To date, five NSs have been developed for the ablation-specific prediction of procedure complexity and postoperative outcomes.However, these models did not catch on in clinical practice because of a lack of formal external validation [18,19], poor ability to discriminate outcomes [15,22], difficulty in score computing or inclusion of not immediately available items [19,21,22].In consequence, PADUA and RENAL NSs remain the established standard methods to assess complexity in nephron-sparing procedures, thanks to their immediacy and extensive validation [11,13,14,16,19,21], despite their open surgery development [7,8].The present study aims to develop a simple, immediate, and reproducible ablation-specific NS that better predicts peri-operative outcomes compared to RENAL and PADUA NSs.Our results showed several important findings.
First, we showed the suboptimal accuracy of PADUA (c-index: 0.60) and RENAL (c-index: 0.62) NSs in predicting trifecta status.The poor discrimination ability is reflected in the distribution of trifecta status across the complexity classes.Specifically, according to PADUA complexity-class stratification, 17% vs. 18% of patients did not achieve trifecta in low vs. moderate classes, respectively.Similarly, according to RENAL complexityclass stratification, 32% vs. 37% of patients did not achieve trifecta in moderate vs. high classes, respectively.When we reproduced PADUA and RENAL multivariable models, the only items predicting trifecta status were tumour size >4 cm and nearness to renal sinus or urinary collecting system.These associations were previously detected by Maxwell et al. [16] in a similar but smaller ablation-treated cohort.These results confirm that despite PADUA and RENAL NSs providing a fair to poor peri-operative outcome discrimination ability according to many PTA cohorts [11][12][13][14][15][16][17], the majority of the items on which these scores are based do not influence outcome prediction.Based on these assumptions, testing new parameters or, at least, recoding established items related to peri-operative outcomes is needed to achieve a better prediction.However, it needs to be stated that, to the best of our knowledge, we are the first to test NS ability to predict an ablation-specific definition of trifecta.In consequence, direct comparison with other studies is not recommended.
Second, based on the aforementioned assumptions, we tested all the available outcomerelated variables recorded in our dataset in a univariable fashion, recoding established NS parameters to better fit the PTA technique and selected SRM characteristics.Following the concept of complexity score, we did not include clinical patient characteristics in the univariable analysis.Of all variables tested, only size (diameter; >3 and ≤4 cm OR: 2.7, >4 cm OR: 5.6; p < 0.001), CSA (≥10 cm 2 OR: 3.9; p < 0.001) and nearness to renal sinus or urinary collecting system (≤4 mm OR: 4.3; p < 0.001) resulted as strongly significant risk factors for not achieving trifecta.Specifically, we recoded PADUA and RENAL tumour size as ≤3 cm vs. >3 and ≤4 cm vs. >4 cm as suggested by previous analyses [30].Moreover, we dichotomized the RENAL variable nearness to renal sinus or urinary collecting system in >4 mm vs. ≤4 mm, accounting for the higher precision of the PTA technique compared to PN.Third, to the best of our knowledge, CSA has never been tested in a PTA setting.This variable accurately sums up the concepts of steric tumour shape and endophicity/exophicity [23,24].Finally, among other variables tested, the nearness to the ureter and bowel were not strong predictors of trifecta status, probably due to the standardized use of hydrodissection techniques when needed.Despite the high variety of available items tested, we were not able to validate some previously tested variables that could result as significant predictors such as, for example, the lesion enhancement pattern [16].In consequence, future analyses should focus on these specific variables to predict trifecta status.
Third, we fitted a multivariable logistic regression model based on the three described variables: su(rface), n(earness to renal sinus or urinary collecting system), s(ize).This novel model showed good calibration and greater accuracy (c-index: 0.74) when compared to PADUA and RENAL NS.Similar to PADUA and RENAL development process, we assigned a score to each variable level based on the ORs, the total points ranging from 3 to 8. Subsequently, we generated complexity classes based on the categorical coded score ORs and trifecta achievement rates.SuNS moderate (5-6 points) and high complexity (7-8 points) classes showed an OR of 3.3 and 14.3, respectively, when low complexity (3-4 points) was the reference class.Compared to PADUA and RENAL NSs, the SuNS score discrimination ability is also reflected in a more representative distribution of patients overall (61% vs. 31% vs. 8% in low vs. moderate vs. high risk) and of patients not achieving trifecta (11% vs. 30% vs. 65% in low vs. moderate vs. high) across complexity classes.Moreover, we also observed important discrepancies in patient distribution across PADUA, RENAL and SuNS NS.For example, 40% of patients reclassified in SuNS low complexity class were classified as moderate/high class with the PADUA scoring system.Moreover, 74% of patients reclassified in high complexity class according to the SuNS score, were initially classified as low/moderate class according to RENAL.Additionally, the SuNS score can be easily adopted in clinical practice, presenting only three variables that are readily derivable from preoperative imaging.Therefore, of crucial importance in preoperative outcome predictions, is the high quality of diagnostic imaging exams, which should always consist of a CECT including urographic phases or magnetic resonance imaging.Last but not least, a sensitivity analysis assessed good accuracy for outcome prediction for both RF and MW techniques.
Taken together, we developed an immediate, simple, and reproducible ablationspecific NS (SuNS) that outperformed PADUA and RENAL NSs in predicting trifecta status.
Despite its novelty, our study is not devoid of limitations.First, the current data are retrospective and influenced by inherent selection bias.Second, this study represents the practice of a high-volume oncological referral centre, and our results might hardly be generalizable.Third, the study was a single-centre evaluation, potentially creating bias in techniques or outcomes.Fourth, due to missing data, we were unable to perform a formal comparison with other NSs previously reported [9,10,18,19,21,22].Fourth, despite bootstrap resampling, the lack of external validation currently limits predictions concerning the possible impact of our NS in clinical practice.Fifth, our cohort did not involve patients treated with cryoablation and further validation in this setting is needed to extend the SuNS score application.Last, we lack other important PTA outcomes, such as ablation time and local recurrences, which should, therefore, be tested in future analyses.

Conclusions
We developed an immediate, simple, and reproducible ablation-specific NS (SuNS) that outperformed PADUA and RENAL NSs in predicting peri-operative outcomes.External validation is required before daily practice implementation.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/diagnostics13182955/s1,Supplementary Figure S1: Stacked bar plots depicting trifecta not achieved according to complexity classes: (a) trifecta status according to SuNS in microwave-treated patients; (b) trifecta status according to SuNS in radiofrequency treated patients; Supplementary Table S1: Separate univariable logistic regression models predicting trifecta status (trifecta achieved vs. trifecta not achieved) in patients diagnosed with small renal masses between 2008 and 2021 and treated with ablation; Supplementary Table S2: Rates of trifecta achieved and univariable logistic regression predicting trifecta status (trifecta achieved vs. trifecta not achieved) according to SuNS score and derived complexity classes; Supplementary Table S3: Descriptive characteristics of 418 patients diagnosed with small renal masses between 2008 and 2021 and treated with ablation.Stratification is made according to SuNS score complexity class (low

Figure 1 .
Figure 1.Stacked bar plots depicting trifecta not achieved and factors determining trifecta status distribution according to complexity classes: (a) trifecta status according to PADUA; (b) trifecta

Figure 1 .
Figure 1.Stacked bar plots depicting trifecta not achieved and factors determining trifecta status distribution according to complexity classes: (a) trifecta status according to PADUA; (b) trifecta factors according to PADUA; (c) trifecta status according to RENAL; (d) trifecta factors according to RENAL.

Figure 2 .
Figure 2. Calibration plot of observed proportions versus predicted probabilities of not achieving trifecta according to the SuNS model.We defined three complexity classes based on logistic regression ORs associated with each score point (Supplementary TableS2).Moderate complexity (5-6 points) and high complexity (7-8 points) classes showed ORs of 3.3 and 14.3, respectively, as compared to the low complexity class (3-4 points).

Figure 2 .
Figure 2. Calibration plot of observed proportions versus predicted probabilities of not achieving trifecta according to the SuNS model.

Figure 3 .
Figure 3. Stacked bar plots depicting trifecta not achieved and factors determining trifecta status distribution according to complexity classes: (a) trifecta status according to SuNS; (b) trifecta factors according to SuNS.

Figure 3 .
Figure 3. Stacked bar plots depicting trifecta not achieved and factors determining trifecta status distribution according to complexity classes: (a) trifecta status according to SuNS; (b) trifecta factors according to SuNS.

Table 2 .
Separate multivariable logistic regression models for PADUA and RENAL predicting trifecta status (trifecta achieved vs. trifecta not achieved) in patients diagnosed with small renal masses between 2008 and 2021 and treated with ablation.Bootstrapped c-index represents the discrimination ability of the score generated by the models.

Table 3 .
Multivariable logistic regression models predicting trifecta status (trifecta achieved vs. trifecta not achieved) in patients diagnosed with small renal masses between 2008 and 2021 and treated with ablation.Bootstrapped c-index represents the discrimination ability of the score generated by the models.