External Validation of the FSAC Model Using On-Therapy Changes in Noninvasive Fibrosis Markers in Patients with Chronic Hepatitis B: A Multicenter Study

Simple Summary We externally validated the recently suggested FSAC prediction model for hepatocellular carcinoma (HCC) in treatment-naïve Asian chronic hepatitis B patients starting potent antiviral therapy (AVT). The model reflects age, sex, presence of cirrhosis, and on-therapy changes in non-invasive fibrosis markers (NFMs) after 12 months of antiviral therapy, such as APRI and FIB-4. Our results highlighted better predictive performance for the FSAC model for HCC (Harrell’s c-index: 0.770) than the PAGE-B, modified PAGE-B, modified REACH-B, LSM-HCC, and CAMD models, which only use baseline parameters. A simplified version of FSAC score (i.e., FSAC (2)), including only NFMs at 12 months, also showed a high c-index value (0.763). Our retrospective study suggests that the accurate measurement of intra-hepatic fibrotic burden during adequate AVT is necessary for predicting HCC development. Abstract Antiviral therapy (AVT) induces the regression of non-invasive fibrosis markers (NFMs) and reduces hepatocellular carcinoma (HCC) risk among chronic hepatitis B (CHB) patients. We externally validated the predictive performance of the FSAC prediction model for HCC using on-therapy NFM responses. Our multicenter study consecutively recruited treatment-naïve CHB patients (n = 3026; median age, 50.0 years; male predominant (61.3%); cirrhosis in 1391 (46.0%) patients) receiving potent AVTs for >18 months between 2007 and 2018. During follow-up (median 64.0 months), HCC developed in 303 (10.0%) patients. Patients with low FIB-4 or APRI levels at 12 months showed significantly lower HCC risk than those with high NFM levels at 12 months (all p < 0.05). Cumulative 3-, 5-, and 8-year HCC probabilities were 0.0%, 0.3% and 1.2% in the low-risk group (FSAC ≤ 2); 2.1%, 5.2%, and 11.1% in the intermediate-risk group (FSAC 3−8); and 5.2%, 15.5%, and 29.8% in the high-risk group (FSAC ≥ 9) (both p < 0.001 between each adjacent pair). Harrell’s c-index value for FSAC score (0.770) was higher than those for PAGE-B (0.725), modified PAGE-B (0.738), modified REACH-B (0.737), LSM-HCC (0.734), and CAMD (0.742). Our study showed that the FSAC model, which incorporates on-therapy changes in NFMs, had better predictive performance than other models using only baseline parameters.


Introduction
Chronic hepatitis B virus (HBV) infection is a major public health problem affecting approximately more than 250 million people worldwide; it remains a leading cause of hepatocellular carcinoma (HCC), especially in endemic areas such as Korea [1][2][3][4][5]. The risk of developing HCC has substantially decreased in the past several decades, stemming primarily from the use of potent oral nucleos(t)ide analogues with high genetic barriers, that is, entecavir (ETV), tenofovir disoproxil fumarate (TDF), and tenofovir alafenamide (TAF), which can effectively suppress viral replication and reduce processes of necro-inflammation and/or fibrosis [4][5][6]. Notwithstanding, since such highly active antiviral therapy cannot eradicate intra-hepatic HBV itself and the molecular mechanisms of hepato-carcinogenesis are complex [7][8][9], the regular surveillance of patients with chronic hepatitis B (CHB) is recommended to detect early stage HCC, for which treatment with a curative aim might be possible [10][11][12].
Many models have been developed to help with predicting the risk of HCC development among HBV patients with or without AVT. Since the prognostic role of baseline serum HBV-DNA levels has been substantially attenuated in the present era of potent nucleos(t)ide analogues, models established within one decade (e.g., PAGE-B, modified PAGE-B, and CAMD) have adopted the presence of baseline cirrhosis rather than virological factors [13][14][15]. Meanwhile, other HCC prediction models using fibrosis parameters assessed during long-term AVT (i.e., modified REACH-B, CAGE-B, and SAGE-B scores) have also been introduced with promising results [16,17]. Most recently, given that baseline fibrosis and/or necro-inflammation can be partially modified or regressed through long-term AVT [18][19][20], Nam et al. [21] recently suggested a novel HCC prediction model using fibrosis markers assessed at dual time points, named the Fibrosis marker response, Sex, Age, and Cirrhosis (FSAC) score. The model incorporates on-therapy changes, including a fibrosis index based on four factors (FIB-4) [22,23] and aspartate aminotransferase (AST)-to-platelet ratio index (APRI) [24] at 12 months, as well as sex, age, and cirrhosis. The model has been found to show a significantly higher predictive performance for the 10-year prediction of HCC (c-index value of 0.84) than the PAGE-B (0.77), modified PAGE-B (0.80) and REACH-B (0.67) models.
Here, we aimed to externally validate the predictive performance of the newly developed FSAC model in comparison with other risk prediction models assessed at one time-point in an independent HBV cohort treated with ETV, TDF, or TAF.

Study Design and Patient Follow-Up
Treatment-naïve CHB patients (age ≥ 19 years) who received AVT with ETV, TDF, or TAF between January 2007 and December 2018 at Yonsei University Severance Hospital, Gangnam Severance Hospital, and Yongin Severance Hospital, were screened for eligibility. Figure S1 depicts the flow of patient recruitment. All patients underwent transient elastography using FibroScan ® (EchoSens, Paris, France) at the time of AVT initiation, as described in previous reports. Cirrhosis was histologically or clinically diagnosed as follows: (1) platelet count of <150/×10 3 /µL and imaging findings suggestive of cirrhosis, including a blunted, nodular liver edge accompanied by splenomegaly (>12 cm) or (2) clinical signs of portal hypertension, such as gastroesophageal varices [25]. AVT was initiated for patients with CHB or cirrhosis according to the practice guidelines of the Korean Association for the Study of the Liver and the reimbursement guidelines of the National Health Insurance Service of Korea [26].
During follow-up, all patients underwent imaging studies (abdominal ultrasonography) and routine laboratory testing, including serum HBV-DNA, alpha-fetoprotein (AFP), and other viral markers, at 3-to 6-month intervals, as surveillance tests for HCC [5,[27][28][29][30]. The primary outcome was HCC development, which was diagnosed based on histological evidence or typical radiological findings [14,[31][32][33]. The study was conducted according to the guidelines of the Declaration of Helsinki and was approved by the Institutional Review Board of Yonsei University Health System, Severance Hospital (IRB No. 4-2020-0491, 22 June 2020). Patient consent was waived due to the retrospective nature of this study.

Non-Invasive Assessment of Fibrotic Burden and Calculation of HCC Risk Scores from Prediction Models
FSAC score was calculated based on changes in on-therapy non-invasive fibrosis markers (NFMs) at 12 months (NFMR12), as described in the literature (Table S1) [21,22,24]. The NFMR12 was classified into 4 groups: group A was defined as <3.25 and <1.45; group B as <3.25 and ≥1.45; group C as ≥3.25 and <1.45; and group D as ≥3.25 and ≥1.45 for baseline and 12-month FIB-4 indices, respectively. Group A was defined as <1.5 and <0.5, group B as ≥1.5 and <0.5, group C as <1.5 and ≥0.5, and group D as ≥1.5 and ≥0.5 for baseline and 12-month APRI, respectively [21]. Other HCC-risk prediction models, including PAGE-B, modified PAGE-B, modified REACH-B, LSM-HCC, and CAMD, were also calculated at the time of AVT initiation [13,15,[34][35][36]. A simplified version of the FSAC score (FSAC [2]), which only adopted NFMs at 12 months, was also calculated (Table S1) [21].

Statistical Analysis
Continuous variables are expressed as medians (interquartile range [IQRs]) and were compared using Student's t-tests or the Mann-Whitney U test depending on their distribution. Categorical variables are expressed as numbers (%) and were evaluated using the chi-squared test or Fisher's exact probability test. The index date was defined as the date of AVT initiation. Patients were censored when they ended follow-up, died without HCC development, underwent liver transplantation, or developed extra-hepatic malignancy. Cox regression analysis was conducted to analyze associations between HCC development and individual risk factors and to calculate hazard ratios (HRs) with 95% confidence intervals (CI). The cumulative probability of HCC development was evaluated using the Kaplan-Meier method, and differences were assessed with the log rank test.
The predictive performances of the risk scoring models for HCC development were assessed using Harrell's C-index, time-dependent area under the receiver operating characteristic curve (TDAUC) at 3-, 5-, and 8-years from the index date and integrated area under the receiver operating characteristic curve (iAUC). Furthermore, lower values for the Akaike information criterion (AIC) were considered reflective of a better discriminatory ability for each model. Model performance was presented graphically using calibration plots, which compared the model prediction probability with the actual probability of HCC development. Discrimination and calibration were evaluated using the bootstrap method with re-sampling 1000 times.
Statistical differences in parameters of predictive performance between FSAC and the other HCC-risk prediction models were evaluated using the bootstrap method with re-sampling 1000 times. If 95% CIs contained zero, there was deemed to be no significant difference in the parameters of predictive performance between the two models.

Baseline Characteristics and HCC Development
According to the enrollment criteria, a total of 3026 treatment-naïve patients were recruited. Their baseline characteristics are shown in Table 1. The median age was 50.0 (IQR 42.0-57.0) years, with a male predominance of 61.3%. In total, 1621, 1325, and 80 patients were treated with ETV, TDF, and TAF, respectively. Cirrhosis was diagnosed in 1391 (46.0%) patients, and positive hepatitis B e antigen (HBeAg) was detected in 1045 (34.5%) patients.  (2,9) Values are expressed as a number (%) or median (interquartile range). Abbreviations: HBeAg, hepatitis B e antigen; INR, international normalized ratio; AST, aspartate aminotransferase; ALT, alanine aminotransferase.

Predictive Factors of HCC Development
Univariate Cox regression analysis revealed that age, male sex, diabetes mellitus, hypertension, cirrhosis, higher liver stiffness values, lower platelet counts, higher AST, higher ALT levels, and lower serum albumin levels were significantly associated with the development of HCC (all p < 0.05    Table 3). The FSAC model showed significantly higher Harrell's c-index and iAUC values than the other models (Table 4). In terms of other parameters of predictive performance (i.e., TDAUCs at 3, 5, and 8 years), the FSAC model consistently showed significantly higher performance than the PAGE-B, modified PAGE-B, modified REACH-B, LSM-HCC, and CAMD models (Table 4).
Subgroup analysis among patients with cirrhosis (n = 1391, 46.0%) showed that the Harrell's c-index and the iAUC value of the FSAC model were 0.668 (95% CI 0.633-0.701) and 0.661 (95% CI 0.627-0.694), respectively (Table S5), and the values were higher than the other models. However, considering 95% CI, the FSAC model did not show significantly higher performance than the other models except the PAGE-B among patients with cirrhosis (Table S6).   If 95% CI interval contains zero, there is no significant difference between two models. Abbreviations: HCC, hepatocellular carcinoma; CI, confidence interval; TDAUC, time-dependent area under the receiver operational characteristics curve; iAUC, integrated area under the receiver operational characteristics curve; LSM, liver stiffness measurement.

Discussion
Several risk-scoring systems have been proposed to exclude HCC development within about 10 years. Most recently, Nam et al. proposed an upgraded HCC risk prediction model, that is, FSAC score, using on-therapy responses in NFMs in patients with treatment-naïve CHB. The model has been found to show superior predictive performance over other HCC-risk prediction models that incorporate only baseline factors [21]. In the present multicenter study of an independent large-scale HBV cohort, we confirmed the prognostic performance of the FSAC model to be acceptable, reliable, and superior to other HCC risk prediction models, including PAGE-B, modified PAGE-B, modified REACH-B, LSM-HCC, and CAMD, in a consistent manner.
Our study had several clinical implications. First of all, the prognostic performance of the FSAC model over other HCC-risk prediction models was reproduced. The large sample of >3000 patients with long-term follow-up enhanced the statistical reliability of the results. Moreover, a sufficient number of HCC cases (n = 303, 10.0%) occurred during the median follow-up period of 64.0 months, allowing for highly acceptable statistical power. Second, among the low-risk group defined according to FSAC score (n = 845), only 6 patients developed HCC, with an annual incidence of 0.15%. Considering that current surveillance strategies to detect early-stage HCC may be cost-effective when annual risk exceeds at least 0.2% in patients without cirrhosis and 1.5% in patients with cirrhosis, the 27.9% of individuals in the low-risk group could likely avoid biannual abdominal ultrasonography-based HCC surveillance safely. Conversely, the intermediate-and highrisk groups, accounting for the remaining 72.1% of this study population, may require more delicate surveillance, given the suboptimal diagnostic sensitivity of abdomen ultrasonography and the overall poor prognosis of HCC detected at advanced stages. Thus, in order to achieve higher detection rates of HCC at early stages and greater costeffectiveness, further studies on how to implement individualized surveillance strategies

Discussion
Several risk-scoring systems have been proposed to exclude HCC development within about 10 years. Most recently, Nam et al. proposed an upgraded HCC risk prediction model, that is, FSAC score, using on-therapy responses in NFMs in patients with treatmentnaïve CHB. The model has been found to show superior predictive performance over other HCC-risk prediction models that incorporate only baseline factors [21]. In the present multicenter study of an independent large-scale HBV cohort, we confirmed the prognostic performance of the FSAC model to be acceptable, reliable, and superior to other HCC risk prediction models, including PAGE-B, modified PAGE-B, modified REACH-B, LSM-HCC, and CAMD, in a consistent manner.
Our study had several clinical implications. First of all, the prognostic performance of the FSAC model over other HCC-risk prediction models was reproduced. The large sample of >3000 patients with long-term follow-up enhanced the statistical reliability of the results. Moreover, a sufficient number of HCC cases (n = 303, 10.0%) occurred during the median follow-up period of 64.0 months, allowing for highly acceptable statistical power. Second, among the low-risk group defined according to FSAC score (n = 845), only 6 patients developed HCC, with an annual incidence of 0.15%. Considering that current surveillance strategies to detect early-stage HCC may be cost-effective when annual risk exceeds at least 0.2% in patients without cirrhosis and 1.5% in patients with cirrhosis, the 27.9% of individuals in the low-risk group could likely avoid biannual abdominal ultrasonography-based HCC surveillance safely. Conversely, the intermediate-and high-risk groups, accounting for the remaining 72.1% of this study population, may require more delicate surveillance, given the suboptimal diagnostic sensitivity of abdomen ultrasonography and the overall poor prognosis of HCC detected at advanced stages. Thus, in order to achieve higher detection rates of HCC at early stages and greater cost-effectiveness, further studies on how to implement individualized surveillance strategies based on optimal visit intervals and the adoption of novel diagnostic modalities using radiology and/or serum biomarkers are warranted.
Notably, we found that cumulative HCC risk tended to be more affected by NFM at 12 months itself rather than NFMR12, even though on-therapy changes in NFMs were also effective to predict HCC development. This suggests that over-estimated baseline fibrotic burden by FIB-4 and APRI, in patients with elevated AST due to necro-inflammatory activity before starting AVT, should exaggerate the degree of regressed NFM after AVT. This hypothesis is supported by the observation that the simplified version of the FSAC model (i.e., FSAC (2)) which included only NFMs assessed at 12 months, also showed a Harrell's c-index value similar to the FSAC model (0.763, 95% CI 0.737-0.787). The observations in this study are consistent with reports of excellent predictive performance for the CAGE-B and SAGE-B models that incorporate liver stiffness values on transient elastography after 5 years of AVT.
Although the predictive performance of FSAC model among our study population was acceptable, further research using novel biomarkers that can exclude the overestimation caused by active necro-inflammation before starting AVT (e.g., three-dimensional magnetic resonance elastography) should be required in the near future [37], in order to enhance their prognostic performances for general use in routine practice. Furthermore, along with the dynamic changes in APRI or FIB-4 index during AVT, assessment of transient elastography, which proved to have higher predictive efficacy after 12 months as an easier predictive algorithm, might give useful information. Actually, when we tried to stratify the risk of HCC development by the on-treatment LS value (cutoff value: 6.4 kPa) from the subgroup with available paired TE results (n = 1102, 36.4%) [18], we confirmed their significant association (p < 0.001). Hence, further large-scale study with the serial follow-up of transient elastography during long-term AVT should be required.
Our study has several limitations. First, although liver stiffness values by transient elastography were available in all patients at the baseline, approximately two thirds of patients did not undergo transient elastography during long-term AVT, primarily because it is not reimbursed by the National Health Insurance Service in Korea. Since the predictive performances might vary depending on the sample size and HCC incidence, further studies are required in the setting of paired transient elastography tests. Second, when the HCC prediction models were assessed among a subgroup with cirrhosis, their prognostic performances were generally attenuated. This is most likely because the discriminatory power of the variables constituting the models (e.g., platelet counts, fibrosis scores, presence of cirrhosis, or reduced liver function) might become considerably attenuated in the relatively "homogenous" cirrhotic subgroup. Given that the majority of HCC rises in the setting of cirrhosis, especially among patients in Western countries, further studies concerning development of novel biomarkers should be required, allowing the optimized prognostication in a subgroup with cirrhosis.

Conclusions
In this external validation study of a large-scale cohort, the FSAC model exhibited more robust predictive performance for HCC development, compared to other HCC-risk prediction models that use only baseline characteristics. For predicting HCC development, accurate measurement of fibrotic burden during long-term AVT is necessary.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/cancers14030711/s1, Figure S1: The Flowchart of patients' enrollment. Table S1: Definition of components which constitute FSAC and FSAC (2), Table S2: Classification of patients according to non-invasive fibrosis marker response at 12 months after antiviral therapy for chronic hepatitis B, Table S3: Cox regression analysis for HCC development, Table S4: Comparison of predictive performance between the FSAC and FSAC (2) models, Table S5: Predictive performance of the FSAC and other risk-prediction models among patients with cirrhosis (n = 1391), Table S6: Comparison of predictive performance between the FSAC and other HCC risk-prediction models among patients with cirrhosis (n = 1391).