Liver Transplantation for Hepatocellular Carcinoma: A Real-Life Comparison of Milan Criteria and AFP Model

Simple Summary The α-fetoprotein (AFP) model officially replaced the Milan criteria in France for liver transplantation (LT) for hepatocellular carcinoma (HCC) in January 2013. The aim of our retrospective study was to analyze the agreement of the criteria and the results of LT with an intention-to-treat design since the adoption of the AFP model and to compare them to the practice and results of LT before the adoption of the AFP model. We did not observe significant changes in practices in 523 consecutively listed patients, with a good agreement (88%) to AFP criteria on the explants before and after the adoption of the AFP model. However, the prognosis of patients listed in the most recent period was worse, maybe because of a significant increase in bridging treatments and in the waiting time. This observational study provides an insight into the real-life course of LT for HCC. Abstract Purpose: To compare the agreement for the criteria on the explant and the results of liver transplantation (LT) before and after adoption of the AFP (α-fetoprotein) model. Methods: 523 patients consecutively listed in five French centers were reviewed to compare results of the Milan criteria period (MilanCP, n = 199) (before 2013) and the AFP score period (AFPscP, n = 324) (after 2013). (NCT03156582). Results: During AFPscP, there was a significantly longer waiting time on the list (12.3 vs. 7.7 months, p < 0.001) and higher rate of bridging therapies (84 vs. 75%, p = 0.012) compared to the MilanCP. Dropout rate was slightly higher in the AFPscP (31 vs. 24%, p = 0.073). No difference was found in the histological AFP score between groups (p = 0.838) with a global agreement in 88% of patients. Post-LT recurrence was 9.2% in MilanCP vs. 13.2% in AFPscP (p = 0.239) and predictive factors were AFP > 2 on the last imaging, downstaging policy and salvage transplantation. Post-LT survival was similar (83 vs. 87% after 2 years, p = 0.100), but after propensity score analysis, the post-listing overall survival (OS) was worse in the AFPscP (HR 1.45, p = 0.045). Conclusions: Agreement for the AFP model on explant analysis (≤2) did not significantly change. AFP score > 2 was the major prognostic factor for recurrence. Graft allocation policy has a major impact on prognosis, with a post-listing OS significantly decreased, probably due to the increase in waiting time, increase in bridging therapies, downstaging policy and salvage transplantation.


Introduction
The success of liver transplantation (LT) for hepatocellular carcinoma (HCC) is ruled by the necessity of similar outcomes for HCC and non-HCC recipients, as they directly compete in a large waiting list, with a system of prioritization that is constantly in question. Indeed, the risk of tumor recurrence has to be the lowest, and HCC candidates for LT should be strictly selected in the context of organ shortage.
International guidelines have considered the Milan criteria as the standard for selecting HCC patients for deceased donor LT [1] as a guarantee of good outcomes, with an actuarial survival rate of 75% after four years in the original publication in 1996 [2], confirmed in 2011 [3]. 'Expanded' criteria, developed to allow a wider access for patients with HCC that may receive clinical benefit from LT, failed to enter the last European and American recommendations [4,5].
On the other hand, the prognostic information of the AFP value at the time of listing, and probably even more at the time of LT, has proved its worth over time and is now well established [1]. It has been shown that the kinetic of AFP, with an increase of more than 7.5 ng/mL/month, is a predictor of post-LT recurrence [6], while its response to locoregional therapies is a predictor of good outcomes [7]. A rate of AFP > 1000 ng/mL could represent an exclusion criterion for LT [8], as proposed by the UNOS in 2016. Combined with the response to locoregional therapies (LRTs), the AFP response identifies good transplant candidates [9].
The AFP model (Table 1) established in 2012 [10] accounts for AFP level, number of nodules and the size of the largest nodule. It has been proven to outperform the Milan criteria in identifying candidates with low risk of HCC recurrence or who will survive for 5 years after LT, as per the revised Up-to-Seven criteria, also called Metroticket V2.0 model [11], developed after the Up-to-Seven criteria [12]. The strength of the AFP model lies in the demonstration of classification improvement between low-and high-risk HCC. Because Milan criteria were regularly overstepped, the AFP model has been endorsed in France since January 2013, and HCC candidates must now have an AFP score ≤ 2 to remain or be reintegrated within the waiting list after downstaging. Table 1. Calculation of the AFP (α-fetoprotein) score. The score is calculated by adding the individual points for each obtained variable. A cut-off value of 2 separates between patients at high and low risk of recurrence [10].

Variables β Coefficient Hazard Ratio Points
Largest diameter, cm Since 2007, the French allocation system has been based, except for emergency transplantation (acute liver failure, primary nonfunction), on a common score called 'liver score', which essentially considers the MELD score and the time spent on the waiting list, with the burden for this latter parameter depending on the indication of transplantation. HCC patients obtain a higher 'liver score' and get access to LT more quickly than patients with isolated cirrhosis, and compete with them after 14 to 18 months even when they have a low MELD score. It is worth noting that another important change in the French practices occurred around the year 2012, depending on centers, with the generalized use of 'temporary contraindications' (TCIs) among patients for LT, which basically means 'temporary inactivation' on the waiting list without losing the allocation points gained by the time spent on the waiting list. Patients with HCC controlled after a curative treatment or exceeding allocation criteria can be 'temporarily contraindicated' until reassessment of the LT indication (recurrence or successful downstaging).
In this French multicentric retrospective study, we first aimed at comparing the agreement for the AFP score on the explant analysis before and after the adoption of these criteria. The secondary goal is to compare the general results of LT in terms of tumor recurrence, dropout rate, overall survival and disease-free survival before and after the AFP model implementation.
We aimed at examining how the AFP model performed in prioritizing patients who would have been excluded by Milan, but who had acceptable outcomes, and conversely, whether it identified patients within Milan criteria but who would have had inferior outcomes.
Finally, a large cohort and a large volume of data give us the opportunity to discuss the impact of downstaging, waiting time and response to bridging therapies.

Study Design
All patients registered consecutively for LT because of an HCC between March 2011 and March 2014 on the ABM (Agence de Biomédecine, French Agency for organ sharing) listing in five French centers were included, whether they finally underwent LT or not, in order to analyze the overall results of LT in an intention-to-treat design. The centers were Centre Hepatobiliaire Paul Brousse (Villejuif), Montpellier, Lille, Lyon and Grenoble. Data cutoff was April 2017. The entire French database could not be analyzed because of the lack of accessibility of data.

Patients
A total of 557 consecutive patients were screened to participate in this study. The patients whose explants did not reveal HCC or for whom HCC was not the first indication for listing were excluded. We included living donors (n = 9), domino grafts (n = 11), partial transplantation (n = 9), exceptional graft (n = 1) or expert component (n = 2) in order to keep the studied population as close to real patients in the daily practice as possible. For the same reason, patients who had spent more than one year or were still listed in 'temporary contraindication' because of the absence of tumoral progression were maintained in the analysis.
The study was approved by the Ethics Committee (CECIC of Auvergne Rhone Alpes, IRB file 2015-31) and authorized by CCTIRS and CNIL Committee regarding the use of patient data (NCT 03156582).

Data Collection
Data collection was retrospectively done on each site. Pretransplant data included demographics, cause of cirrhosis, histological data at time of diagnosis, imaging tumor features at first diagnosis, listing as a salvage transplantation (defined as the presence of a curative treatment done more than one year before listing) and AFP values at the time of diagnosis.
The official registered number and size of tumors and the AFP values declared to ABM were prospectively collected, as were the imaging tumor features collected at the time of listing.
Pre-LT bridging therapies and their results, the last imaging tumor features and the last AFP values (within 3 months before LT) were retrospectively collected. The choice of bridging therapies could slightly differ according to the radiological or surgical skills of each local tumor board meeting, but the indications were still the existence of 'active' nodules because of viable tumor tissue on imaging. Response to treatment after locoregional therapy was assessed according to the mRECIST criteria, considering the size and number of the residual viable tumor tissue. Downstaging policy was defined as a reduction in the size of tumor using locoregional therapies, when the patient was outside the criteria in use at the time of the imaging, without upper limit restriction such as UCSF criteria. The success of downstaging was defined by the reintegration to the Milan criteria, whatever the period, to be homogeneous.
The pathological features of HCCs were collected after LT from the explant reports.
Post-LT follow-up data included death, cause of death, HCC recurrence, date of tumor recurrence and date of last follow-up visit. The diagnosis of tumor recurrence was established based on imaging reports, histological reports and/or multi-disciplinary tumor board reports. In the intent-to-treat design of our study, cholangiocarcinoma diagnosis at the time of tumor recurrence was included. All those seven patients had typical HCC on pre-LT imaging and cholangiocarcinoma component on the explant analysis.

Statistical Analysis
Data were expressed as median and interquartile range. The chi-squared test was used for categorical variables, or a Fisher test was used for small samples. A nonparametric test (Mann-Whitney U) was used for numerical variables. Patient survival rates were estimated first with the Kaplan-Meier method and compared with the log-rank test, and a Fine and Gray model was used to take into account the competitive risk due to death. Competitive risk analysis was used to analyze the probability of tumor recurrence. Univariate and multivariate analyses were performed as exploratory analyses, and only on variables with clinical significance for the outcomes of post-transplant survival and recurrence of HCC after LT. Variables with p < 0.15 in univariate analysis were tested in the multivariate Cox proportional hazard model to identify independent prognostic factors. Because of differences in baseline characteristics between the two groups, a propensity score analysis was used. The propensity score has been established on 17 parameters chosen for their clinical pertinence at diagnosis or listing: age, sex, cirrhosis, the presence and number of curative treatments realized before listing to avoid transplantation, the time between diagnosis and listing, the MELD score at listing, the number of tumors, largest diameter and total diameter as well as the AFP value at listing, the noncompliance of the AFP model and of the Milan criteria at listing, the presence and number of bridging therapies while awaiting for LT, the median waiting time for LT, and a downstaging policy. The matching method on this propensity score was done to evaluate the main criteria and compare survival rates.
An independent statistician performed the statistical analyses. Stata software, version 14.2 (StataCorp, College Station, TX, USA), was used for statistical analysis at the Centre d'Investigation Clinique Plurithématique of Centre Hospitalier Universitaire Grenoble Alpes.

Results
After exclusions, the final study population consisted of 523 patients ( Figure 1). A total of 364 patients were given a transplant, whereas 159 patients either dropped out (n = 146) or did not undergo a transplant because of complete tumor response at data cutoff (n = 13).
The group of patients of the MilanCP was composed of patients who were either given a transplant or dropped out of the list at the time of the Milan criteria use (up to 1 June 2013 because of a mandatory re-evaluation of patients by the various teams in order to conform to the criteria on this date). The group included only 199 patients.  The group of patients of the AFP score period (AFPscP) was composed of patients who underwent liver transplantation or dropped out of the list after 1 June 2013, i.e., after the implementation of the AFP score by the ABM and reassessment of patients in each center. This group included 324 patients.

Patients' Characteristics
Patients' characteristics are summarized in Table 2. Initial tumor characteristics were similar in both groups. At listing, about 25% of patients were listed in a salvage transplantation policy; AFP score was ≤2 in 97.5% of the cases during Milan criteria period (MilanCP) compared to 94.8% of the cases during AFP score period (AFPscP) (ns). The proportion of patients with advanced cirrhosis (Child C, MELD > 20) was 13.6% in the MilanCP vs. 7.7% during the AFPscP.  The group of patients of the AFP score period (AFPscP) was composed of patients who underwent liver transplantation or dropped out of the list after 1 June 2013, i.e., after the implementation of the AFP score by the ABM and reassessment of patients in each center. This group included 324 patients.

Patients' Characteristics
Patients' characteristics are summarized in Table 2. Initial tumor characteristics were similar in both groups. At listing, about 25% of patients were listed in a salvage transplantation policy; AFP score was ≤2 in 97.5% of the cases during Milan criteria period (MilanCP) compared to 94.8% of the cases during AFP score period (AFPscP) (ns). The proportion of patients with advanced cirrhosis (Child C, MELD > 20) was 13.6% in the MilanCP vs. 7.7% during the AFPscP.
The rate of downstaging remained around 34% of patients, which was similar in the two periods (p = 0.771), and its success rate, around 50%, was also comparable (p = 0.829). Among the AFPscP patients, 57% had been placed during their waiting time in 'TCI' (temporary contraindication), with a median time of 120 days, vs. only 34% of MilanCP patients, with a median time of 66 days (p < 0.01). The reasons were a complete tumor response to a waiting therapy, or progression requiring additional treatment, or alternative causes needing a reassessment of the patient for maintaining them on the list (mainly alternative cancer or alcohol relapse). In both groups, last imaging occurred within three months before LT. No difference was found either in the number and size of nodules or in the AFP value between groups on the last imaging.
Median waiting time increased from 7.7 months to 12.3 months (p < 0.001). Excluding patients still on the list at the time of data cut-off (12 patients during AFPscP), the success rate of liver transplantation was significantly lower during AFPscP (69.1%) compared to MilanCP (76.4%; p = 0.008). Around 8.2% of patients died of postoperative complications, rejection or sepsis in the weeks after LT, without any differences between subgroups.

Agreement for the Allocation Criteria on Explant Findings
By agreement to criteria, we mean that the number and size of tumors on the explants, coupled with the last alpha-fetoprotein rate, were in accordance with the criteria. The agreement for the histological AFP score was good and similar in both groups: 87.5% in MilanCP and 88.2% in AFPscP (p = 0.838) ( Table 2). No difference was found in the number, size, tumor differentiation and other pejorative histological criteria between the two groups. The propensity score analysis confirmed the absence of significant difference in the agreement for the AFP model on explants after adjustment for baseline characteristics with a p-value of 0.449.
Risk factors for being outside the AFP model on explant analysis were assessed (Table 3). On univariate analysis, being outside Milan criteria on the last imaging, having an AFP score exceeding 2 on last imaging, a high number of bridging therapies and a downstaging policy were significantly associated with being outside AFP score on the explant. Analysis by a multivariate logistic regression model identified only two independent predictors: last-evaluation AFP score exceeding 2 and a downstaging policy. No influence of the transplant period was found.

Post-LT Tumor Recurrence
Forty-two patients presented a tumor recurrence during the follow-up: 14 (9.2%) in MilanCP and 28 (13.2%) in AFPscP (p = 0.239), with an obvious difference in follow-up length between the groups. On these tumor recurrences, seven were cholangiocarcinoma on tumor analysis, but a minority of the recurrences had histology confirmation.
On last-imaging assessment, recurrence rate was not different for patients fulfilling AFP score criteria, whether or not they exceeded Milan criteria, whatever the period (Table 4). Among the few patients transplanted despite exceeding the AFP score criteria, recurrence occurred in 37.5% of the cases during MilanCP and in 50% of cases during AFPscP. Based on explant findings, recurrence rate was only 4 and 5% depending on the period when patients were transplanted within Milan criteria and AFPsc criteria (Table 4), but it was significantly higher for patients fulfilling AFP score but beyond Milan during AFPscP (22%) compared to MilanCP (11.8%; p < 0.05). Same results were observed for patients exceeding AFP score and Milan criteria with significantly higher recurrence during AFPscP (48%) compared to MilanCP (33%; p < 0.05).
Risk of tumor recurrence assessed by competing risk analysis, considering the competing risk of non-HCC-related death, was estimated by the Fine and Gray model and showed similar results, with a subhazard ratio of 1.30 (95%CI (0.43-3.98)), p = 0.645, and thus finally no higher risk in the AFPscP. Cumulative incidence of tumor recurrence after 3 years was 5.8% in AFPscP vs. 4.3% in MilanCP (Figure 2b).

Discussion
Our study shows that there was no better agreement for the AFP criteria on the explant since the implementation of the AFP model, with 87.5% during the MilanCP compared to 88.2% during the AFPscP (considering viable tumor only). This result may translate into the fact that the French medical teams did not comply with the Milan criteria (since in the MilanCP period only 65.8% were in the MC on the explant considering viable tumor and only 54.6% considering total volume), but they intuitively anticipated AFP model application. There was a relative improvement in the agreement for the Milan criteria over time (65.8% in MilanCP and 70.8% in AFPscP, p = 0.314). This rate is slightly better but similar to the rates reported in the literature, ranging between 47% and 66% [13][14][15][16][17][18].
As expected, the AFP score > 2 on explant was predictive of tumor recurrence (in univariate and in multivariate analysis) and was the unique predictor of post-transplant death. Moreover, our analysis of risk factors of tumor recurrence showed that exceeding the AFP score was dynamic. Indeed, exceeding AFP score at diagnosis or at listing was not predictive of tumor recurrence but the last AFP score at imaging and that at the explant analysis were predictive. This pinpoints the major role of other factors during HCC management, such as salvage LT procedure and downstaging policy, which were associated with tumor recurrence in multivariate analysis.
Despite a similar post-LT overall survival during the two periods of time, a worse prognosis was observed for patients listed during AFPscP, with a significantly lower 3year post-listing OS after propensity matching for AFPscP (58.7% (50.5; 66)) compared to MilanCP (68.8% (61.1; 75.4); p = 0.045). This could be explained not only by a significant increase in patients with bridging treatment and a significant number of treatment procedures during the waiting time but also by a significant increase in the waiting time, significantly longer time spent on temporary contraindication and a nonsignificant increase in dropout during AFPscP.
Focusing on downstaging policy, our study shows that being part of a downstaging policy was an independent predictor for being outside of AFP score on the explant analysis (OR = 5.13 (95%CI 2.45-10.8). A successful downstaging to Milan criteria offers a reasonable tumor recurrence rate while superior, concordant with the results of Yao et al. [19] and many others [20][21][22], but unsuccessful downstaging procedure (which should have been a transplant contraindication) was associated with an unacceptable tumor recurrence rate of 57% in the AFPscP. We retain as a message the caution towards downstaging which remains submitted to MC, since results are slightly worse even within MC, and expanded criteria expose patients to higher rates of recurrence [23]. Again, our data suggest worse results in the AFPscP than in the MilanCP in this analysis. In an interesting analysis on downstaging in the United States, Kardashian et al. [22] found that non-downstaged HCC patients receiving LRT had an independently increased rate of HCC recurrence compared with non-downstaged patients not receiving LRT. The hypothesis that LRT may negatively impact HCC outcomes in poor-biology tumors had been raised, mainly because of ischemic damages in the hepatic tumor microenvironment. It has been shown that the production of COX-2 promotes the epithelial-to-mesenchymal transition (EMT) process and enhances HCC invasion and metastasis [24]. The higher number of LRTs in the AFPscP could be a deleterious factor in our results.
A recent article from Di Sandro and colleagues [25] applies the comprehensive assessment of the transplantable tumor proposed by Mazzaferro [26] and included in the Italian Consensus-Based Approach to Organ Allocation in Liver Transplantation, concluding that high-risk patients, including partial response to bridging therapies or to downstaging, could benefit from a prioritization with a level of recurrence similar to intermediate-risk patients when transplanted quickly after re-staging. Even if they remain controversial (results of the article of Metha and Yao [27] propose a threshold to moderate the risk of selecting tumors with less favorable biology) and need wider validation, these results are interesting and could suggest, with our data, that the AFP model should be coupled to a prioritization system in order to improve the results of LT, including response to bridging therapies and downstaging.
It is interesting to see that a cholangiocarcinoma component on the explant analysis was an independent predictive factor for tumor recurrence (21% of recurrences during AFPscP and 7% during MilanCP). This underlines that graft allocation policy is only one factor influencing a patient's prognosis; many others are involved and are changing over time.
Comparing MC and the AFP score, it is disturbing to observe the high risk of recurrence (22%) for patients exceeding the histological MC despite being within the AFP score in the AFPsc period. It is only 11.8% in the MilanCP, but significantly higher than the 4% of recurrence of patients within MC.
However, the radiological assessment of the AFP model instead of the histological one is of the best value because it was collected by a single and independent physician. According to these data, there is no difference in recurrence between patients outside and patients within MC, if they respected the AFP model; recurrence is around 7% in the MilanCP. Moreover, we observed again a slightly, though not significant, worse outcome in the AFPsc period (whether we focused on histological or radiological assessment); we think this is due to the period more than to the criteria. The compliance of physicians to the radiologic assessment of the AFPscore is excellent, 94% during the MilanCP and 96% during the AFPscP, better than it was for the MC in the MilanCP (86%). Radiologic assessment of the AFP score is a valuable tool, and according to it, the AFP score performed in prioritizing patients who would have been excluded by MC but who had acceptable outcomes; conversely, it identified patients within MC but who would have had worse outcomes (50% of recurrence).
A limitation of our study is that it was an observational, retrospective study, which is subject to calculation of the AFP score from imaging or histological reports without central revision. However, size and number of tumors as well as AFP every 3 months during the waiting time were prospectively recorded in the national registry. Another limitation is the absence of data on immunosuppressive treatment after liver transplantation. Based on the retrospective design of our study, it was considered too complex for a correct interpretation despite this factor being of interest for tumor recurrence. Because of the short period of time of this study, we can assume that immunosuppressive policy did not change greatly in each transplant center.

Conclusions
We found a similar agreement for the AFP model before and after its implementation, and we confirmed the value of AFP score for tumor recurrence prediction, but we observed a slightly worse prognosis of patients during the recent period, with several hypotheses (longer waiting time, more LRTs) but no clear explanation. The AFP model has recently been evaluated in Italy [28] and in Latin America [14], while the UK LT program discussed these criteria in the National Consensus Meeting. Our study emphasizes the influence of many parameters on recurrence: compliance of teams, criteria for downstaging, median waiting time, subtleties such as 'temporary contraindication' and maybe salvage transplantation. To assess the outcomes, prospective studies would be necessary for each different system.

Informed Consent Statement:
According to the authorizations of CCTIRS (Comité Consultatif sur le Traitement de l'Information en matière de Recherche dans le domaine de la Santé) and CECIC (Comité d'Ethique des Centres d'Investigation Clinique), patient consent was waived due to the retrospective design of the study in severely ill patients and all data emerging of hospital care units, where patients have been informed at the beginning of their care.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. They have been collected thanks to ABM reports and patient files in each medical center.