Identification of an Upper Limit of Tumor Burden for Downstaging in Candidates with Hepatocellular Cancer Waiting for Liver Transplantation: A West–East Collaborative Effort

Since the introduction of Milan Criteria, all scoring models describing the prognosis of hepatocellular cancer (HCC) after liver transplantation (LT) have been exclusively based on characteristics available at surgery, therefore neglecting the intention-to-treat principles. This study aimed at developing an intention-to-treat model through a competing-risk analysis. Using data available at first referral, an upper limit of tumor burden for downstaging was identified beyond which successful LT becomes an unrealistic goal. Twelve centers in Europe, United States, and Asia (Brussels, Sapienza Rome, Padua, Columbia University New York, Innsbruck, Medanta-The Medicity Dehli, Hong Kong, Kyoto, Kaohsiung Taiwan, Mainz, Fukuoka, Shulan Hospital Hangzhou) created a Derivation (n = 2318) and a Validation Set (n = 773) of HCC patients listed for LT between January 2000–March 2017. In the Derivation Set, the competing-risk analysis identified two independent covariables predicting post-transplant HCC-related death: combined HCC number and diameter (SHR = 1.15; p < 0.001) and alpha-fetoprotein (AFP) (SHR = 1.80; p < 0.001). WE-DS Model showed good diagnostic performances at internal and external validation. The identified upper limit of tumor burden for downstaging was AFP ≤ 20 ng/mL and up-to-twelve as sum of HCC number and diameter; AFP = 21–200 and up-to-ten; AFP = 201–500 and up-to-seven; AFP = 501–1000 and up-to-five. The WE-DS Model proposed here, based on morphologic and biologic data obtained at first referral in a large international cohort of HCC patients listed for LT, allowed identifying an upper limit of tumor burden for downstaging beyond which successful LT, following downstaging, results in a futile transplantation.


Introduction
For a quarter of a century, morphologic Milan Criteria (MC) have been the cornerstone to select patients with hepatocellular cancer (HCC) for liver transplantation (LT) [1]. Many Western and Eastern centers successfully overruled these stringent criteria and were able to transplant a higher number of patients without increasing the risks of post-LT recurrence and death [2][3][4][5][6]. More recently, several variables looking at tumor biology have been introduced into clinical practice [7][8][9][10]. The combination of dynamic biological and morphological tumor characteristics following different downstaging (DS) therapies emerged as a promising tool to fine-tune the selection process of LT candidates in terms of delisting and post-LT recurrence [11][12][13][14][15][16][17][18]. In this context, the identification of an upper limit of tumor burden for DS, evaluated at first referral, is still a matter of debate. Apart from limited experiences, most prognostic HCC-LT models are based on criteria available at LT, therefore failing to consider the intention-to-treat (ITT) principle, namely failing to capture the risk of death or dropout during the waiting time period [11,[14][15][16][17]. Only a few groups aimed at identifying well-defined eligibility criteria for DS; among them, the UNOS-downstaging (UNOS-DS) criteria are the most commonly adopted [19,20]. Conversely, many centers apply a so-called "all-comers" policy, enrolling potential HCC-recipients in a DS protocol without defining an upper limit of tumor burden [21]. Consequently, a better identification of a clinically applicable upper limit of DS should be of great importance with the intent to minimize the potential risk of futile LT, namely, a too high risk of tumor recurrence after transplant.
Moreover, the possibility to identify patients at high risk for futile transplant or, on the contrary, requiring a fast-track approach should be of great relevance in the donor selection process and in the setting of living donation program expansion [22].
This study aims to develop an ITT model enabling to predict the risk of dying after LT due to HCC recurrence based on variables available at first referral and also to identify an upper limit of tumor burden. We hypothesize that a successful LT after DS should become an unrealistic goal in patients exceeding this limit. The model was constructed using derivation and validation cohorts from centers belonging to an Eastern-Western collaborative HCC-LT consortium.

Results
The clinical and tumor characteristics of the entire population are displayed in Table 1. At first referral, 965 of 3091 (31.2%) patients did not meet MC. A total of 2369 (76.6%) patients were treated with at least one LRT before delisting or LT, and 1014 (32.8%) patients had LDLT. Hepatitis C virus (HCV)-related cirrhosis was the most common indication for listing (n = 1396; 45.2%). The only statistically significant difference found between derivation and validation data sets was the waiting time (3.7 vs. 4.5 months, respectively; p = 0.008). During a median follow-up of 40.8 months (IQR: 18.0-80.0), 2735 (88.5%) patients were transplanted, 303 (9.8%) were delisted, and 53 (1.7%) were still on the waiting list at the end of the study period. Delisting due to tumor progression or post-LRT liver failure occurred in 177 (5.7%) patients. HCC-independent delisting due to liver failure or death while on the waiting list occurred in 73 (2.4%) and 53 (1.7%) patients, respectively. Overall, 367 (13.4%) of 2735 transplanted patients experienced HCC recurrence, after a median time of 13.0 months (IQR: 7.0-28.5); 246 (9.0%) died, and 121 (4.4%) were still alive at last follow-up. Dead for other causes occurred in 452 (16.5%) patients.
Cumulative incidences of being delisted, being transplanted, dying of HCC recurrence, and dying after LT due to other causes are depicted in Figure 1. Cumulative incidences of being delisted, being transplanted, dying of HCC recurrence, and dying after LT due to other causes are depicted in Figure 1.

Development of the WE-DS Model
The results of the multivariable competing-risk regression built using the Derivation Set data (n = 2318), are displayed in Table 2. Ten variables available at first entry were investigated: the number of lesions, the diameter of the largest lesion, AFP, Western center localization, MELD, previous LRT, HBV status, age, living-donor LT, HCV status and period of first referral (before vs. after 2010). Only variables obtained at first referral were used for constructing the WE-DS Model. This model identified two independent risk factors for post-transplant HCC-related death: number of lesions plus diameter of the largest lesion (SHR = 1.15; p < 0.001) and AFP level (SHR = 1.87; p < 0.001).  LRT, loco-regional treatments; TACE, trans-arterial chemo-embolization; RFA, radio-frequency ablation; PEI, percutaneous ethanol injection; TARE, trans-arterial radio-embolization; EBRT, external beam radiotherapy; HIFU, high intensity focused ultrasounds; AFP, alpha-fetoprotein.

Development of the WE-DS Model
The results of the multivariable competing-risk regression built using the Derivation Set data (n = 2318), are displayed in Table 2. Ten variables available at first entry were investigated: the number of lesions, the diameter of the largest lesion, AFP, Western center localization, MELD, previous LRT, HBV status, age, living-donor LT, HCV status and period of first referral (before vs. after 2010). Only variables obtained at first referral were used for constructing the WE-DS Model. This model identified two independent risk factors for post-transplant HCC-related death: number of lesions plus diameter of the largest lesion (SHR = 1.15; p < 0.001) and AFP level (SHR = 1.87; p < 0.001).
Graphical representations of the Model are visualized in Figure 2 as contour plots. The WE-DS Model (first referral time point; Figure 2A) was compared with the Metroticket 2.0 Score (LT time point; Figure 2B).
Graphical representations of the Model are visualized in Figure 2 as contour plots. The WE-DS Model (first referral time point; Figure 2A) was compared with the Metroticket 2.0 Score (LT time point; Figure 2B).

Internal and External Validation of the WE-DS Model
Both the Derivation (n = 2318) and Validation (n = 773) Sets were used for the evaluation of the diagnostic ability of the WE-DS Model in terms of risk of post-transplant HCC-related death (Table  3).

Internal and External Validation of the WE-DS Model
Both the Derivation (n = 2318) and Validation (n = 773) Sets were used for the evaluation of the diagnostic ability of the WE-DS Model in terms of risk of post-transplant HCC-related death (Table 3). Table 3. Comparison between different scores for the prediction of post-transplant HCC-related death, in the Derivation (n = 2318) and Validation Cohort (n = 773). At internal validation, the WE-DS Model exhibited the highest c-statistic (AUC = 0.70; 95% CI = 0.67-0.74; p < 0.001). Metroticket 2.0, HALTHCC Score, Pre-MORAL Score, AFP-French Model, and MC status showed an AUC ranging 0.63-0.60; all the other tested scores had an AUCs <0.60.

Identification of the Upper Tumor Burden Limit for Downstaging
The relationship between the Metroticket 2.0 calculated at LT time and the WE-DS Model evaluated at first referral is plotted in Figure 3A. Mazzaferro et al. set the previously reported upper limit of acceptable survival calculated from the time of LT at ≤30% [18]. After recalibration from the time of first referral, this acceptable risk was set to ≤13%. According to these results, an upper limit of tumor burden was identified ( Figure 3B) based on four incremental combinations of morphology and AFP values. In particular, a patient would be considered meeting the WE-DS Model (and thus evaluable for potential transplantation) when meeting the following combinations: AFP ≤20 ng/mL and up-to-twelve as the sum of diameter and number of tumor lesions; AFP = 21-200 ng/mL and up-to-ten; AFP = 201-500 ng/mL and up-to-seven; AFP = 501-1,000 ng/mL and up-to-five. When comparing the WE-DS Model to the MC and UNOS-DS criteria for the entire (Derivation and Validation Set) population of 3091 listed patients, no significant differences were observed in terms of dropout rates (8.5% to 10.0%) with the only exclusion of the WE-DS-OUT cases (12.1%; logrank p = 0.003). After removing the cohort of 1533 Asian HCC-patients, in which no dropouts were reported, the differences among the Western centers raised, with 30.4% of dropouts in the WE-DS-OUT vs. 13.0% to 16.5% observed using the other criteria (log-rank p = 0.001).
Substantial differences were also reported when the 2735 transplanted patients were analyzed.   Figure 4), all groups presented results within this threshold except for the WE-DS-OUT group, confirming thereby the better selection ability of the here proposed model.

Discussion
The primary objective of the study was to predict in a "weighty manner" the risk of post-LT HCC-related death in transplant candidates, with the intent to identify an upper limit of DS and to standardize the LT inscription approach based on a large HCC-LT database including collaborating expert Western and Eastern centers. The currently available scoring systems mainly focus on the estimation of the risk of post-transplant HCC-related death using variables known at the time of LT. Therefore, information on intention-to-treat survival is lacking [1][2][3][4][5][6][11][12][13][14][15][16]18]. To avoid this shortcoming, data available at the first referral time-point were investigated. First referral was judged to be a more useful, reliable and reflecting-the-real period for prognostication of survival. In fact, the preliminary decisions to include a patient in a LT project, or to treat him with LRT, are taken at the time of the first encounter in the out-patient clinic.
The WE-DS Model presented here showed a high ability to select patients with a reduced chance to be successfully transplanted after DS therapies. DS has recently been shown to be a useful prognosticator as well as an identifier of patients presenting favorable tumor biology. Several studies demonstrated that successful DS allows obtaining similar post-LT survival rates respect to conventional LT criteria [8][9][10]19,20]. However, controversies still exist concerning the optimal upper limit of tumor burden to use for DS. According to Mazzaferro's paper aiming at "squaring the circle" of HCC selection and allocation for LT, effectively downstaged patients should obtain a higher priority for LT, thereby minimizing the risk of dropout during the waiting time. However, the initial

Discussion
The primary objective of the study was to predict in a "weighty manner" the risk of post-LT HCC-related death in transplant candidates, with the intent to identify an upper limit of DS and to standardize the LT inscription approach based on a large HCC-LT database including collaborating expert Western and Eastern centers. The currently available scoring systems mainly focus on the estimation of the risk of post-transplant HCC-related death using variables known at the time of LT. Therefore, information on intention-to-treat survival is lacking [1][2][3][4][5][6][11][12][13][14][15][16]18]. To avoid this shortcoming, data available at the first referral time-point were investigated. First referral was judged to be a more useful, reliable and reflecting-the-real period for prognostication of survival. In fact, the preliminary decisions to include a patient in a LT project, or to treat him with LRT, are taken at the time of the first encounter in the out-patient clinic.
The WE-DS Model presented here showed a high ability to select patients with a reduced chance to be successfully transplanted after DS therapies. DS has recently been shown to be a useful prognosticator as well as an identifier of patients presenting favorable tumor biology. Several studies demonstrated that successful DS allows obtaining similar post-LT survival rates respect to conventional LT criteria [8][9][10]19,20]. However, controversies still exist concerning the optimal upper limit of tumor burden to use for DS. According to Mazzaferro's paper aiming at "squaring the circle" of HCC selection and allocation for LT, effectively downstaged patients should obtain a higher priority for LT, thereby minimizing the risk of dropout during the waiting time. However, the initial acceptance of "all-comers" needs to be linked with the potentially unacceptable risk of transplant futility (i.e., very high post-transplant HCC-related death rates) [23]. Data about the acceptable upper limit of tumor burden for entering in a liver transplant program are very scarce [19,20]. With the intent to standardize criteria for downstaging in the United States, UNOS recently implemented the UCSF/Region 5 DS-protocol as a new national policy for granting MELD exception for LT [24].
A recently published study investigating this UNOS-DS protocol showed that the "all-comers" presented a significantly lower rate of successful DS, a higher probability of dropout from the waiting list, and a lower 5-year ITT survival when compared to patients meeting the UCSF-DS criteria [25].
Another US study, including 3819 patients, identified in a multivariable analysis a short-to-mid waiting time (HR = 3.1; p = 0.005) and an AFP ≥100 ng/mL at LT (HR = 2.4; p = 0.009) as risk factors for post-transplant death [21]. According to these results, additional refinements based on AFP and waiting time were suggested in order to optimize the discriminatory ability of the UNOS-DS score further [21].
The WE-DS Model represents the first attempt to create a score integrating tumor morphology and biology to select the upper limit of DS. The advantage of integrating tumor morphology and biology is based on the concept that patients with high tumor number, diameter, or AFP at baseline are less likely to be able to be down-staged, or have tumor progression after initial down-staging and thus ultimately dropout or experience a high recurrence rate after LT [21]. An advanced statistical analysis, based on three different competing events, allowed constructing a mathematically robust model able to simultaneously report the estimations of receiving a transplant and of dying due to HCC recurrence, a piece of information sharply differing from the unique evaluation of dying from tumor-related causes after LT [18]. Moreover, the large Western-Eastern sample size and the relevant number of events (namely, post-LT HCC-related deaths) further allowed doing a robust statistical analysis.
The variables composing the WE-DS Model, namely HCC number, diameter, and AFP, were all identical to those observed in previously reported scores [11][12][13]18]. All of them are well-recognized risk factors for post-transplant HCC-related death and, most importantly, they all are available at patient first referral.
The inclusion of HCC patients belonging to the Western and Eastern world strengthens in our opinion the observed results, representing a unique approach to confirm the universal usability of the proposed scoring system. In fact, locally developed scores might be unbalanced by regional characteristics.
The reported results further reveal that patients out of the WE-DS Model had a significantly higher dropout rate, particularly in the Western cohort. This observation can be explained by the low incidence of LDLT in this geographical area, leading to longer waiting times and, therefore, higher dropout rates. Importantly, when only transplanted cases were analyzed, "all-comers" beyond the WE-DS Model was the only cohort to have five-year HCC-related death rates surpassing 30%, a value corresponding to a futile LT [18]. Interestingly, "all-comers" out of UNOS-DS remained below this 30% threshold, indicating a suboptimal stratification performance.
When the diagnostic ability for the risk of HCC-related post-LT death was internally and externally tested, the WE-DS Model had the best discriminatory power when compared to several previously proposed "urgency" or "utility" scores, all of them based on data obtained at the time of LT [1][2][3][4][5][6][11][12][13][14][15][16]18,19]. The Metroticket 2.0, the AFP French Model and the HALTHCC also had an excellent diagnostic performance, mainly revealed in the Validation Set [11,12,18]. Interestingly, all of these three scores are composed by the same three variables, namely AFP, tumor number, and diameter.
This study can be criticized by comparing the WE-DS Model, based on data obtained at first referral, to scores based on data available at time of transplantation. Unfortunately, no other study investigated the prognostic strength of a model exclusively based on 'first come' variables. Only the French AFP Model included data obtained at the time of listing [11]. The TRAIN score, also based on intention-to-treat data [17], was not tested here because based on "dynamic" AFP and tumor radiological modifications following LRT (and thus not at first referral).
When the WE-DS Model was compared with the currently adopted MC and UNOS-DS criteria [1,19] in terms of dropout and post-LT HCC-related death, it was interesting to observe that only the WE-DS-OUT patients showed poor results. In detail, if the 5-year 30% survival cut-off for HCC-related death proposed by Mazzaferro was considered [18], only the WE-DS-OUT cases exceeded this value, further underlying the selection ability of the WE-DS model.
The present study has some limitations. Firstly, the heterogeneity of the investigated population may be addressed as the main drawback of the study, due to the different match/allocation systems and LRT approaches used among the various centers. In fact, we feel that these differences may represent a benefit concerning the design of the study, namely the creation of a mathematical model based only on internationally relevant risk factors, avoiding thereby center-related biases. Secondly, the retrospective nature of the study potentially affects the power to appraise variables such as radiological characteristics. Such limitation is shared with all the studies including large HCC patient cohorts [4,[10][11][12][14][15][16][17][18]. The considerable variability of treatment management among centers might represent a third point of weakness. As this study relies on first-referral data only, differences in LRT strategies are bypassed; moreover, the use of LRT therapies did not affect any of the competing-risk models. This may be explained by the fact that post-LRT response rather than LRTs per se influences delisting and post-transplant recurrence risk [17]. Unfortunately, the decision to use only first-referral variables for the construction of the model limited our opportunity to capture the prognostic role of response to bridging or downstaging LRT. Several studies revealed that radiological response to LRT is a better predictor of tumor biology than tumor morphology alone [15,17,25]. Although it is understandable that constructing a model without radiological response to LRT is a limitation, it was voluntarily decided not to investigate this parameter. In fact, radiological response after LRT is available only at time of LT, making impossible to use this variable for constructing a model based only on "first referral" variables. Preliminary study protocol, data collection management, and study coordination among the involved centers were realized by JL (Scientific Coordinator, Brussels) and QL (Data Manager, Sapienza Rome).

Study Design
All the patients consecutively evaluated and enlisted for LT with a radiological diagnosis of HCC were considered recruitable for the present study. Nature of LT (upfront vs. salvage) or type and number of neo-adjuvant treatments were not considered exclusion criteria. After the exclusion of patients with mixed hepatocellular-cholangiocellular cancer, cholangiocarcinoma misdiagnosed as HCC, and incidental HCC, the studied sample numbered 3091 cases. A Derivation Set of 2318 candidates (75.0%) and a Validation Set of 773 candidates (25.0%) were obtained from the entire population. Block randomization was performed to maintain a similar representation of Western and Eastern centers in the two data sets (Supplementary materials).

Hepatocellular Cancer Management and Definitions
The diagnosis of HCC was made according to international guidelines [26][27][28]. Tumor upper limits for transplantability significantly differed among the centers, with a more conservative approach in the West and a more aggressive approach in the East, namely with higher percentages of enlisted patients initially out of the conventional LT criteria (Supplementary Table S1).
The decision to perform loco-regional treatments (LRT) was also different among the centers, although the possibility of down-staging or bridging was shared in case of expected long waiting times [26][27][28]. Several variables were collected at the time of first referral and of LT or delisting with the intent to construct a comprehensive ITT model using only first-referral variables. The diameter of the largest tumor and the number of nodules were evaluated taking only into account the vital tissue as identified by arterial enhancement. Response to neoadjuvant therapy was prospectively assessed using the modified Response Evaluation Criteria in Solid Tumours (mRECIST) in patients listed after 2010. In patients enlisted before 2010, mRECIST were evaluated retrospectively by local radiologists.
The Institutional Ethical and Scientific Review board of the coordinating center approved the study (Sapienza University of Rome, approval code 1214/2019). All the other centers obtained local approvals according to the ethical rules concerning retrospective studies. The study was registered at http://www.ClinicalTrials.gov (ID: NCT03595345). All data in relation to liver transplantations performed in the People's Republic of China were obtained after the modification of the transplant law banning the use of organs from executed prisoners (year 2015).

Statistical Analysis
Continuous variables were reported as medians and interquartile ranges (IQR). Dummy variables were reported as numbers and percentages. The maximum likelihood estimation method for managing missing data was used [29]. For all the variables used for constructing the models, the missing data were always <5%. Mann-Whitney U test and Fisher's exact test were used for comparisons of continuous and categorical variables, respectively.
The model was constructed using the Derivation Set data and adopting the Fine and Gray methodology for competing-risk regressions [30]. Sub-hazard ratios (SHR), and 95% confidence intervals (95%CI) were reported. Three competing events were investigated: (1) delisting; (2) death after LT due to HCC recurrence, and (3) death after LT due to no-HCC-related causes. The delisting was defined as any event of dropout or death whilst on the waiting list. Post-transplant HCC-related death was defined as any death directly related to tumor recurrence after LT. All patients who died of causes other than HCC or were alive with recurrence at the date of the last follow-up visit were censored. The last censoring was performed on 31 March 2017. Our attention was focused on the second model, namely, the risk of dying after LT due to tumor recurrence. The accuracy of the model was assessed in the Derivation and Validation Set through c-statistics. Confidence intervals for the c-statistic derived from 100 bootstrap replications of the technique. The accuracy of the model was compared with several previously proposed criteria able to predict the risk of dropout or post-transplant recurrence [1][2][3][4][5][6][11][12][13][14][15][16][17][18].
After adopting the results obtained from the proposed model, an upper limit of tumor burden for DS was identified. The five-year acceptable risk for post-LT HCC-related death ≤30% proposed in the Metroticket 2.0 study was considered [18]. This value was "recalibrated" in an ITT fashion. This "recalibration" allowed identifying a tumor burden threshold based on a combination of alpha-fetoprotein (AFP) level, tumor number, and diameter, all of them being available at first referral. These parameters participated in creating the West-East DownStaging Model (WE-DS). A more detailed explanation of this estimation is reported in Supplementary materials.
The Kaplan-Meier method was used with the intent to evaluate the observed dropouts and post-transplant HCC-related deaths. Different groups determined according to different tumor burden available at first referral were compared: (1)  Variables with a p < 0.05 were considered statistically significant. SPSS statistical package version 24.0 (SPSS Inc., Chicago, IL, USA) was used and competing-risk analyses were done using the packages "cmprsk", "risk-regression", "crrSC", and "pec" of R-project (R version 3.4.3; R Foundation for Statistical Computing, Vienna, Austria).

Conclusions
In conclusion, the WE-DS Model, based on both morphologic and biologic data obtained at first referral in a large international (Western-Eastern) cohort of HCC patients listed for LT, allowed identifying an upper limit of tumor burden for downstaging beyond which successful LT, following downstaging, results in a futile transplantation.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/2/452/s1, Table S1: Patients out of Milan Criteria and Up-to-Seven Criteria at first referral and liver transplantation in the different centres.