Overview of Prognostic Systems for Hepatocellular Carcinoma and ITA.LI.CA External Validation of MESH and CNLC Classifications

Simple Summary This review proposes a comprehensive overview of the main prognostic systems for HCC classified as prognostic scores, staging systems, or combined systems. Prognostic systems for HCC are usually compared in terms of homogeneity, monotonicity of gradients, and discrimination ability. However, despite the great number of published studies comparing HCC prognostic systems, it is rather difficult to identify a system that could be universally accepted as the best prognostic scheme for all HCC patients encountered in clinical practice. In order to give a contribute in this topic, we conducted a study aimed at externally validate the MESH score and the CNLC classification using the ITA.LI.CA database. Abstract Prognostic assessment in patients with HCC remains an extremely difficult clinical task due to the complexity of this cancer where tumour characteristics interact with degree of liver dysfunction, patient general health status, and a large span of available treatment options. Several prognostic systems have been proposed in the last three decades, both from the Asian and European/North American countries. Prognostic scores, such as the CLIP score and the recent MESH score, have been generated on a solid statistical basis from real life population data, while staging systems, such as the BCLC scheme and the recent CNLC classification, have been created by experts according to recent HCC prognostic evidences from the literature. A third category includes combined prognostic systems that can be used both as prognostic scores and staging systems. A recent example is the ITA.LI.CA prognostic system including either a prognostic score and a simplified staging system. This review focuses first on an overview of the main prognostic systems for HCC classified according to the above three categories, and, second, on a comprehensive description of the methodology required for a correct comparison between different systems in terms of prognostic performance. In this second section the main studies in the literature comparing different prognostic systems are described in detail. Lastly, a formal comparison between the last prognostic systems proposed for each of the above three categories is performed using a large Italian database including 6882 HCC patients in order to concretely apply the comparison rules previously described.


Introduction
Ideal staging systems and prognostic scores for cancer management should offer a common scale to provide an accurate prognostic prediction for specific populations as well as for individual patients ("precision medicine"), appropriate selection criteria for the treatment avoiding under-and over-treatment, and an optimal design of randomized controlled trials. Conventional staging systems consider only the morphologic features of the tumour, while prognostic scores usually take into account all the main aspects affecting the final prognosis. Staging system and prognostic score should be easy to use, reproducible and transportable to different populations in order to be recommended and used on a large scale [1].
Unlike other tumours, hepatocellular carcinoma (HCC) usually arises in the context of another remarkable disease, i.e., liver cirrhosis, making its management unique and more complex with respect to other malignancies. The prognosis of HCC patient is related to three main factors [2]: tumour burden and aggressiveness, liver dysfunction degree and the general health status of the patient. Number and size of lesions, vascular invasion and metastatic spread usually define tumour burden. Alpha-fetoprotein (aFP) level has been included in some prognostic systems to describe tumour aggressiveness. Liver function is usually evaluated through biochemical markers (albumin, bilirubin, prothrombin time) and signs and symptoms of liver dysfunction (ascites, encephalopathy, portal hypertension, impaired renal function, hyponatremia), or through the inclusion of multiparametric liver function scores such as the Model for End Stage Liver Disease (MELD) score [3], Child-Pugh score (CPS) [4] or albumin-bilirubin (ALBI) score [5]. The Eastern Cooperative Oncology Group (ECOG) Perfomance Status (PS) [6] or the Karnofsky index [7] are used to describe the general health of the patient.
During the last three decades several staging systems or prognostic scores have been proposed from both the Asian and European/North American world to estimate the prognosis of HCC patients ( Figure 1). We can classify these systems/scores into three main categories, based on the methodology by which they were created: (1) Prognostic scores, derived from real cohort populations.
(2) Staging systems, derived from the literature review (3) Combined prognostic systems, based on the literature evidences but weighted in a real population, and with the possibility to be used both as scores and as staging systems.
In this paper, we first present an overview of the main prognostic systems classified according the above three categories for the general population of HCC and, hence, not for specific sub-populations of HCC patients (i.e., early, intermediate, or advanced HCC) or only for specific treatments (i.e., liver transplantation, liver resection, ablation, intra-arterial therapies, or systemic therapies).
A second section is dedicated to a comprehensive description of the methodology required for a correct comparison between different systems in terms of prognostic performance. In this section we also describe the main studies comparing different prognostic systems with the aim of identifying the one with the best performance. Third, in order to concretely apply the comparison rules previously described to the need of clinical practice, we have performed a formal comparison between the most recent prognostic systems proposed for each of the above mentioned three categories, using a large Italian database. In particular, we have compared the Taiwanese Model to Estimate Survival for HCC (MESH) [8] score for the prognostic scores category, the Chinese Liver Cancer (CNLC) classification [9][10][11] for the staging systems category, and the Italian Liver Cancer (ITA.LI.CA) prognostic score and staging system [12] for the combined prognostic systems category. The Cancer of the Liver Italian Program score [13,14] and the Barcelona Clinic Liver Cancer (BCLC) classification [15] are also taken as reference for the prognostic scores and the staging systems categories, respectively. From this point of view, this study represents also an external validation of the MESH and CNLC prognostic systems. Table 2 describes all the most important data-based prognostic scores, whereas, in this paragraph we analyse only the four main ones: the Okuda system [16], the CLIP score [14], the Japanese Integrated Staging score [17,18] and the MESH score [8]. The Okuda staging system, proposed in 1984, represents the first attempt to stage HCC including variables aimed at weighting the contribution of cirrhosis to the patient prognosis [16]. It indeed combines the anatomical extension of the tumour (≤ or >50% involvement of the liver) to the liver dysfunction (expressed by albumin, bilirubin, presence of ascites). Nowadays, the Okuda system has been progressively abandoned as its main limit is the dichotomous vision of HCC size, that makes this system not useful in modern clinical practice where a considerable percentage of HCC are detected before their burden crosses the 50% of the liver volume.

Prognostic Scores
The CLIP score [13,14] was developed through a retrospective cohort study and has been considered an excellent prognostic score, especially because it has been externally validated. Unfortunately, it does not consider the patient clinical status and it is scarcely sensitive in stratifying early HCCs, amenable to curative treatments such as percutaneous ablation or surgical therapies.
Japanese HCC experts proposed the JIS score [18] that combines the Japanese TNM and Child-Pugh (C-P) classifications. This score lacks a strong external validation in European/North American countries, and it is almost exclusively used in Japan.
The MESH score [8] is the last proposed data-based prognostic score for HCC (Table 1) which was assembled using data of 3182 prospectively enrolled patients. This score (ranging from 0 to 6 points) combines Milan Criteria, presence and type of vascular invasion, C-P score, performance status and laboratory parameters (aFP and Alkaline Phosphatase).
Even this system does not propose treatment recommendations. However, MESH score had at least on external validation in European/North American countries [19].  As for other cancers, the TNM system [24] is based on tumour pathological features, but it does not consider the liver function and does not stratify for the patient general health condition.
The BCLC classification [15], proposed in 1999, was the first system integrating liver function assessment, tumour extension and also patient general health status. It classifies patients into five subgroups, from 0 to D, and each group is associated with a specific therapy. This classification can be considered an evidence-based system, since it was generated by analysing the results of randomized controlled studies testing a given treatment versus placebo in patients with comparable tumour characteristics and liver function. The BCLC system has been endorsed by the American Association for the Study of Liver Diseases (AASLD), the American Gastroenterology Association (AGA), the European Association for the Study of Liver (EASL), and the European Organization for Research and Treatment of Cancer (EORTC) [25,26].
Over the years, the BCLC flow chart has been frequently modified. It is not an aim of this study to discuss pros and cons of the BCLC treatment algorithm [27]. The BCLC suffers from the fact that it was not created and weighted in "real-world" HCC populations. As a result, its prognostic performance is usually lower than that of data-based prognostic scores [28,29]. In addition, some potential limits of the BCLC structure that could affect the prognostic power of this system are: (a) the absence of a size cut-off for single HCC in early stage; (b) the high heterogeneity of intermediate and advanced stages; (c) the absence of a clear distinction between intra-and extra-hepatic vascular invasion; (d) the absence of prognostic biomarkers such as aFP; (e) the excessive prognostic weight given to performance status 1; (f) the poor prognostic stratification of liver dysfunction degree (i.e., only a simple distinction between Child-Pugh C and Child A-B classes is proposed in the original BCLC scheme).
Finally, the CNLC staging system [9,10] represents the chart endorsed by the National Health and Family Planning Commission of the People's Republic of China for HCC surveillance, diagnosis, staging and treatment (Table 3). These recommendations, released in 2017 and updated in 2019, are a management summary regarding all the aspect of HCC patients delineated by a multidisciplinary panel of more than 50 experts, including surgeons, oncologists, hepatologists, interventional radiologists etc. This staging system takes into account patient general health status, tumour burden and liver function. It has several similarities with the BCLC system, but it supports more aggressive treatment options for advanced HCC stages. For example, the CNCL system indicates liver resection in patients belonging to Ia, Ib, and IIa categories, that correspond to the BCLC B stage with 2-3 nodules >3 cm, and also for select patients classified in IIb and IIIa stage (multinodular and locally advanced HCC) [9][10][11]. The updated 2019 CNCL version [11] is reported in the Table 3.

Combined Staging Systems
The Hong Kong Liver Cancer (HKLC) [30] and ITA.LI.CA [12] prognostic systems are the two main examples of combined systems.
The HKLC [30] was developed in 2014, predominantly on a cohort of patients with HBV-related HCC. In this score performance status, C-P score, tumour status (based on Milan Criteria), intra-and extra-hepatic vascular invasion or metastases were the predefined criteria, based on literature evidence. These variables were subsequently weighted in a real population in order to assign a relative coefficient to each of them.
The HKLC system can be used both as a prognostic score and as a staging system to help treatment assignment.
The HKLC, compared to BCLC classification, has better ability to prognostically stratify patients assigned to BCLC intermediate and advanced stages, who can therefore benefit from more aggressive treatments than those recommended by the BCLC system. The pitfall of HKLC system is the lack of solid external validation in a non-Asian population. Very few studies have compared BCLC and HKLC scores, and in European/North American populations the latter did not show a better prognostic performance than BCLC [23,[30][31][32].
The ITA.LI.CA prognostic system [12], created in 2016 through a multicentre retrospective analysis and validated in a Taiwanese cohort, is a prognostic model able to efficiently predict the outcomes of HCC patients. It can be used as a prognostic score based on tumour burden, liver function and other patient-related variables (Tables 4 and 5). This system recalls the BCLC classification concerning the stratification of tumour characteristics in different stages, but with provides a better definition of the intermediate stage based on literature evidences. In particular, the intermediate stage has been arranged in three sub-groups. A size cut-off was introduced for single tumour to distinguish between stage A and B1. Furthermore, intra and extra-hepatic HCC vascular invasion were identified as separate entities, also considering that HCC with intra-hepatic vascular invasion is liable of therapeutic options with radical intent [12]. Patient functional status was evaluated with the C-P score and the ECOG performance staus. Lastly, aFP, which provides important prognostic information, has been added. Each variable showed a different impact in determining the final score, and, consequently, different points were attributed to variables in order to correctly weight their prognostic influence. Lastly, based on these scheme, the ITA.LI.CA integrated prognostic score has been created. The lowest score (score 0) of the model corresponds to the best prognosis, while the highest one (score 13) depicts the worst prognostic scenario.
In the original study by Farinati et al. [12], this score was internally and externally validated in a large Taiwanese cohort, and more recently, Borzio et al. [33] externally validated the ITA.LI.CA score in an independent multicenter cohort study including 1508 HCC patients. The ITA.LI.CA score has been found to perform better than other scores even in restaging patients at the time of HCC recurrence and before treatment decisions [34].
In conclusion, ITA.LI.CA showed a great ability to predict the prognosis in HCC patients. Moreover, the ITA.LI.CA prognostic system can be also converted in a simple ITA.LI.CA staging to assist treatment allocation [35]. This innovative staging system proposes therapeutic options for each stage based on the so called "treatment hierarchy", an approach inspired by the Precision Medicine alternative to the "stage hierarchy" concept [34,36].

Summary of the Pros and Cons of Prognostic Systems
Prognostic scores are usually developed from a real-life cohort population using objective and reproducible variables. These systems rely on a rigorous statistical methodology usually based on multivariable survival models derived from a process that is agnostic to known risk factors. This peculiar statistical process explains why these score have often a good prognostic performance. Unfortunately, not all these systems have been internally and externally validated. Moreover, since they are developed from a specific population, their application to the general population is not always feasible and, even most importantly, they do not define tumor stages able to guide the treatment selection. Staging systems are usually created by a panel of experts who establish different prognostic stages based on the evidence of the scientific literature. The main advantage of these systems is that they offer a potential linkage between HCC stage and treatment. However, they are based on a weak statistical methodology, and, for this reason, they usually show a lower prognostic power than prognostic scores.
Combined staging systems are developed from evidence-based composite variables (i.e., Child-Pugh score, tumour features) a priori defined by experts. These composite variables are then weighted in a real population to create the prognostic score. The score is usually also converted to a staging system to allow and facilitate treatment assignment. The theoretical advantage of combined systems is that they allow obtaining, simultaneously and in a balanced manner, a good prognostic evaluation (using the prognostic score) and an appropriate treatment allocation (using the staging system). These features theoretically make combined staging systems more effective and clinically useful than prognostic scores and staging systems categories.
A relevant issue for HCC clinical management is the relationship between prognostic systems and treatment choice [36] . This complex relationship can be analyzed from two points of view. The first is mainly a prognostic point of view. Since, treatment selection is influenced by different prognostic variables (i.e., tumour characteristics, liver function, and patient general conditions) there is a statistical interaction between treatment and other variables, so treatment can not be included as an additive variable in a general prognostic system. From this specific prognostic point of view, therefore, commonly used prognostic systems (described in this paper) can be used for a prognostic assessment for the general HCC population, but specific prognostic scores for each treatment should be used to obtain a more accurate prognostic estimation after that treatment decision is taken [34]. In this review we only described prognostic systems designed for a general HCC population independently from treatment choice, while treatment specific prognostic scores are not object of this study.
The second point concerns the relationship between prognostic systems and treatment assignment. As described in this paper, only staging and combined systems categories proposed treatment algorithms for HCC patients. Several evidences from the literature showed, however, that adherence to these algorithms (i.e., linking treatment choice to a specific stage according to the stage hierarchy philosophy) was very low in every day clinical practice [31,[37][38][39][40]. A multidisciplinary evaluation aimed to obtain a personalized treatment decision is probably the best way to optimize HCC patient outcome. On this perspective, the treatment hierarchy approach is closer than stage hierarchy to precision medicine therapeutic approach for HCC [36].

Comparison of Available Prognostic Systems
The performance of a prognostic system is defined by three characteristics: homogeneity, discriminatory ability and monotonicity of gradients [41]. A system is homogeneous when differences in survival between patients of the same stage are small. The discriminatory power is the ability of the system to produce great differences in survival among patients in different stages. When monotonicity of gradients is fulfilled, the survival of patients in each stage is longer than that of patients in the subsequent adjacent stage.
These three characteristics are measured using the likelihood ratio (LR) derived by a Cox regression model, the Akaike Information Criterion (AIC), the Harrell's C-index, and the X 2 linear trend test (LT). The AIC is calculated from the LR test and it is particularly useful to compare ordinary prognostic systems with a different number of stages/points. A low AIC value (corresponding to a high LR test) testifies a high homogeneity and monotonicity of gradients, while high values of C-index and LT test indicate high discriminatory ability and monotonicity of gradients [42,43].
A lot of comparative studies have been conducted with the goal to identify the system with the best prognostic power in HCC patients. In the study of Marrero et al. [44], the BCLC staging system, as compared with six other prognostication systems, showed the best independent predictive power for survival. The superiority of BCLC staging system was supported by external validations in Korean [45] and Italian populations [46]. However, in the last years, inherent limitations of the BCLC system have emerged. In 2016, Liu et al. [47] compared 11 staging systems in a large prospective database including 3182 HCC patients. The ability to predict the prognosis was analysed through the homogeneity and corrected AICc. This study obtained low AICs for the BCLC system, while the CLIP system resulted to be the best prognostic model in all patients as well as in the subsets created according to the aetiology and treatment strategy. Therefore, this study showed that a data-based score (CLIP score) performs better than an evidence-based system.
The study by Farinati et al. [12] assessed the prognostic powers of the ITA.LI.CA, BCLC, HKLC, MESIAH, CLIP, and JIS systems. The ITA.LI.CA score showed the best discriminatory ability and monotonicity of gradients in all three study cohorts (training, internal validation and external validation). In particular, the C-index of the ITA.LI.CA score was 0.71 and 0.78 in the internal and external validation cohort, respectively. The LR test indicated that the ITA.LI.CA system had a significantly better discrimination ability (p < 0.001) than the other systems in all three studied groups, and the superiority of the ITA.LI.CA score was also confirmed after stratification for time-period. The ITA.LI.CA prognostic system shows a great ability to predict individual survival in European and Asian populations [12,48].
However, it is worth to note that the prognostic ability depends on several variables, including time period, the geographical location of the study, numbers and type of patient population, modality of comparison and type of HCC treatment(s) mainly adopted in the analyzed population (Table 6). Indeed, therapeutic management can greatly affect the predictive power of a score, so that the best staging system for HCC patient undergoing liver resection might not be the same of that showing the best performances in patients receiving palliative treatments or supportive care. As a matter of fact, in a Taiwanese cohort of 2010 patients, the survival was better predicted by the Tokyo staging system in patients undergoing liver resection, and by the CLIP score in patients who received chemotherapy or supportive care [49].
The geographical area, Asian or European/North American region, can also influence the staging system performance throughout a number of factors, including etiology of liver disease, tumor biology, predominant stage(s) at the time of HCC diagnosis and treatment strategies [50]. In particular, in Asian the TNM, JIS and CLIP systems showed the best predictive power, while in the European/North American countries BCLC and CLIP showed the best discriminatory power. Therefore, it is easy to understand why no universal consensus has been so far reached on which prognostic system can be considered the best one. [46][47][48][49][50][51].      The ITA.LI.CA system performed better than other multidimensional prognostic systems, even after stratification by curative or palliative treatment. This new system appears to be particularly useful for predicting individual HCC prognosis in clinical practice.

Study Population
With the intent to compare in real life clinical practice the prognostic performance of more recent HCC prognostic systems, we used the last version of the Italian Liver Cancer (ITA.LI.CA) database. This database is a large, multi-centre registry containing prospectively collected data of patients with a newly diagnosed and recurrent HCC managed in 23 Italian centers with different levels of expertise (secondary and tertiary referral centres) [77,78]. It currently includes 7816 HCC patients consecutively evaluated and managed from January 1987 to December 2018, and its data are updated every 2 years and periodically revised by the coordinator center (Semeiotics Unit, Alma Mater Studiorum-Bologna University). The management of the ITA.LI.CA database conforms to the Italian legislation on privacy. According to the Italian laws, no specific patient approval is needed for any retrospective analysis, but all patients provided written informed consent for every diagnostic and therapeutic procedure, as well as for having their clinical data recorded anonymously in the ITA.LI.CA database. The study was approved by the Institutional Review Board of the ITA.LI.CA coordinating center, Alma Mater Studiorum University of Bologna (approval number 99/2012/O/Oss), and it was conducted in accordance to the ethical guidelines of the 1975 Declaration of Helsinki.
In order to limit the potential bias to include patients managed with old imaging and treatment tools, we only enrolled 6882 patients with their HCC diagnosed between 2000 and 2018.
This database, due to its heterogeneity in terms of tumour stage, underlying liver disease severity, and therapeutic approaches, provides a reliable insight into the characteristics of HCC in a European/North American population and, therefore, it represents an optimal substrate for the external validation of the new MESH [8] and CNLC [9][10][11] scores in real-life clinical practice.

Statistical Analysis
Baseline characteristics were examined based on frequency distribution; continuous data are presented as median (interquartile range) unless otherwise indicated.
Overall survival was defined as the time elapsed from the date of HCC diagnosis to the date of death, last follow-up evaluation, or data censoring (31 December 2019). Kaplan-Meier survival curves were used to estimate the median overall survival (OS) and 1-, 3-, 5-, and 10-year survival rates. Kaplan Meier curves were used to describe survival figures of different stages of each prognostic system, and the log rank test was used to compare differences in survival. As detailed in Section 3 of this article, we compared the prognostic performance of different HCC prognostic systems in terms homogeneity, monotonicity of gradients, and discrimination ability using the LT chi-square, the AIC value and the C-index tests. In addition, the prognostic systems with the best prognostic performance was compared with the other systems by using the likelihood ratio test (the higher the test value, the greater the superiority). Missing data of study covariates always involved <10% of patients. Thus, they were estimated using the Maximum Likelihood Estimation method [79]. A two-tailed p-value < 0.05 was considered statistically significant, and analyses were performed with JMP ® Pro 15.
All the prognostic systems showed good discrimination ability at the first evaluation based on Kaplan Meier survival figures (Figure 2). Median OS (95% confidence intervals) of different stages/scores are described in detail in Table S2.
Evidence-based systems (BCLC, CNLC) and the ITA.LI.CA simplified staging showed some limitations in the discrimination ability for advanced stages (Figure 2), namely, stages C and D for BCLC, stages IIIa, IIIb, and IV for CNLC, and stages C and D for ITA.LI.CA simplified staging.   Recently proposed prognostic models, such as MESH score and CNLC classification, showed better homogeneity, discriminatory ability and monotonicity of gradients than the older BCLC staging and CLIP score (Table 7). Nevertheless, the ITA.LI.CA score and the ITA.LI.CA simplified staging showed the best prognostic performance among evaluated systems. In particular, the C statistic of the ITA.LI.CA score in the whole study group was 0.693, a value superior to that of the ITA.LI.CA simplified staging (0.667), MESH (0.662), CNLC (0.661), BCLC (0.659) and CLIP (0.620). According to the likelihood ratio test, the prognostic performance of the ITA.LI.CA score was, once again, better than that of the other systems (p < 0.0001). In the columns are reported the C-index, the test for trend chi-square and the AIC of the tested prognostic models. The higher the C-index and the test for trend chi-square, the higher the discriminatory ability and monotonicity of gradients. The lower the AIC value, the higher the homogeneity and the monotonicity of gradients. In addition, in the last column the ITA.LI.CA score was compared with other systems by using the likelihood ratio test: the higher the test value, the higher the superiority of the ITA.LI.CA score over the compared system. Abbreviations: C, concordance; χ 2 , chi square; AIC, Akaike Information Criterion; LR, likelihood ratio; ITA.LI.CA, Italian Liver Cancer; CNLC, Chinese Liver Cancer; BCLC, Barcelona Clinic Liver Cancer; CLIP, Cancer Liver Italian Program.

Conclusions
This review proposes a comprehensive overview of the main prognostic systems for HCC. Data-based prognostic scores, such as the CLIP score and the recent MESH score, being created on a solid statistical basis, generally have a good prognostic performance. However, for the same reason, they show a good prognostic performance in populations similar to the one in which they were generated, while the performance worsens in geographically and ethnically different cohorts. Moreover, they are of limited utility in supporting the therapeutic choice given their intrinsic "score" structure.
Evidence-based staging systems, such as the BCLC system and the recent CNLC classification, are useful in assisting treatment selection since they are usually created as treatment algorithms. However, since their structural variables are not prognostically weighted in real life populations, they often have a prognostic performance lower than that of genuine prognostic scores. Moreover, they carry the risk to limit personalized treatment of HCC patients strictly linking a given treatment to a specific stage ("stage hierarchy" approach) [36].
Combined prognostic systems are created from evidence-based simple or integrated variables (i.e., Child-Pugh score, tumour features, ECOG performance status, biomarkers such as aFP) that are prognostically weighted in real populations. The main examples of these models are the HKLC and the ITA.LI.CA prognostic systems. These systems have the potentiality to guarantee a good prognostic performance coupled with the ability to help in the treatment choice. Nonetheless, so far they are still very seldom used in clinical practice.
Prognostic systems for HCC are usually compared in terms of homogeneity, monotonicity of gradients, and discrimination ability. However, despite the great number of published studies comparing HCC staging/scoring systems, it is rather difficult to identify a system that could be universally accepted as the best prognostic scheme for all HCC patients encountered in clinical practice. We conducted a study aimed at externally validate the MESH score and the CNLC classification using the ITA.LI.CA database.
These two new systems confirmed a good homogeneity, monotonicity of gradients and discrimination ability also in our large Western HCC population. However, their performance was inferior to that of the ITA.LI.CA score and the ITA.LI.CA simplified staging. Nevertheless, it should be consider that this inferiority could be, at least in part, due to the fact that the comparison was made in the population from which the ITALICA prognostic model have been generated.
The results of our comparison, however, suggest some conclusions as far as survival prediction of HCC patients is concerned. First, modern prognostic systems seem to perform better for HCC patients than the older ones of the same category (i.e., MESH works better than CLIP score among prognostic scores; CNLC works better than BCLC among staging systems). Second, prognostic scores seem to perform better than staging systems. Third, all currently available prognostic systems have a suboptimal prognostic performance (C index ≤ 0.7), suggesting that substantial improvements are needed. The inclusion within these systems of biological markers measuring the cancer aggressiveness is likely the key factor that will allow to reach an optimal prognostic estimation of HCC patients survival outcome.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/cancers13071673/s1, Table S1: Patient Characteristics in the Study Group, Table S2: Discrimination ability of different HCC prognostic systems. Distribution of patients in different points/stages of the prognostic systems and corresponding observed median survivals. Log-rank test resulted p < 0.0001 for all prognostic systems. Informed Consent Statement: The management of the ITA.LI.CA database conforms to the Italian legislation on privacy. According to the Italian laws, no specific patient approval is needed for any retrospective analysis, but patients provided written informed consent for every diagnostic and therapeutic procedure, as well as for having their clinical data recorded anonymously in the ITA.LI.CA database.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article and its Supplementary Materials.