FibroTest for Evaluating Fibrosis in Non-Alcoholic Fatty Liver Disease Patients: A Systematic Review and Meta-Analysis

(1) Background: FibroTest™ is a multi-marker panel, suggested by guidelines as one of the surrogate markers with acceptable performance for detecting fibrosis in patients with non-alcoholic fatty liver disease (NAFLD). A number of studies evaluating this test have been published after publication of the guidelines. This study aims to produce summary estimates of FibroTest™ diagnostic accuracy. (2) Methods: Five databases were searched for studies that evaluated FibroTest™ against liver biopsy as the reference standard in NAFLD patients. Two authors independently screened the references, extracted data, and assessed the quality of included studies. Meta-analyses of the accuracy in detecting different levels of fibrosis were performed using the bivariate random-effects model and the linear mixed-effects multiple thresholds model. (3) Results: From ten included studies, seven were eligible for inclusion in our meta-analysis. Five studies were included in the meta-analysis of FibroTest™ in detecting advanced fibrosis and five in significant fibrosis, resulting in an AUC of 0.77 for both target conditions. The meta-analysis of three studies resulted in an AUC of 0.69 in detecting any fibrosis, while analysis of three other studies showed higher accuracy in cirrhosis (AUC: 0.92). (4) Conclusions: Our meta-analysis showed acceptable performance (AUC > 0.80) of FibroTest™ only in detecting cirrhosis. We observed more limited performance of the test in detecting significant and advanced fibrosis in NAFLD patients. Further primary studies with high methodological quality are required to validate the reliability of the test for detecting different fibrosis levels and to compare the performance of the test in different settings.


Introduction
Non-alcoholic fatty liver disease (NAFLD) is a potentially progressive disorder, representing a wide spectrum of disease from simple steatosis and different degrees of fibrosis 2 of 17 to non-alcoholic steatohepatitis (NASH), and eventually cirrhosis. With a global prevalence of approximately 25%, NAFLD is now the leading cause of chronic liver disease worldwide and a growing challenge to public health [1,2]. NAFLD is the manifestation of metabolic syndrome in the liver and is strongly associated with obesity, especially combined with insulin resistance [3][4][5]. This association explains the high prevalence of NAFLD in the obese population, which affects a significant number of patients with a body mass index (BMI) over 30 kg/m 2 globally [5,6].
There is evidence that liver fibrosis stage is the most potent prognostic factor for NAFLD patients. Any progress in the fibrosis stage (from less than F2 to F3 and F4) is associated with a higher risk of long-term outcomes and an increase in liver-related mortality [7][8][9][10][11]. Hence, early identification of NAFLD patients with advanced fibrosis (F3/4) is recommended by international guidelines [12][13][14] and is a key area of interest for clinical trial recruitment [15].
Liver biopsy is currently recommended as the reference standard for detecting NASH and hepatic fibrosis [16]. However, its suitability for diagnosis in clinical practice or in drug development has been questioned, because of the costly and invasive nature of this procedure, and the risk of potentially severe or even fatal complications [17][18][19]. The limitations of the liver biopsy have fueled the development of non-invasive NAFLD biomarkers.
FibroTest™ (FibroSURE in the US) is a panel of biochemical markers that originally was developed for the assessment of bridging fibrosis in patients with hepatitis C (HCV) [20]. Later, it has also been evaluated for other chronic liver diseases, including hepatitis B (HBV) [21,22], alcoholic liver disease (ALD) [23,24], and NAFLD [25].
The EASL-EASD-EASO Clinical Practice Guidelines recommend FibroTest™ as a non-invasive biomarker with acceptable diagnostic accuracy, as defined by an area under the receiver operating characteristic curve (AUC) > 0.80 for detecting fibrosis and NAFLD progression [16]. This guideline, published in 2016, referred to only two studies for this recommendation [25,26]. By now, more studies that evaluated the accuracy of this biomarker in NAFLD patients have been published, with varying levels of performance. It is unclear whether the basis for the recommendation in the guideline still corresponds to the current body of evidence.
To provide more precise summary estimates of clinical performance and to explore likely sources of variability in the reported test accuracy, we performed a comprehensive systematic review and meta-analysis of studies of the performance of FibroTest™ in detecting any fibrosis (F1-F4), significant fibrosis (F2-F4), advanced fibrosis (F3-F4), or cirrhosis (F4) in NAFLD patients.

Materials and Methods
This study was conducted as part of LITMUS (Liver Investigation: Testing Marker Utility in Steatohepatitis), a large multi-center project funded by the European Union IMI2 scheme. This project aimed to evaluate a set of biomarkers for detecting NASH and fibrosis in NAFLD patients. The protocol of the full systematic review is available in PROSPERO (CRD42018106821). The report of this study was prepared using the PRISMA-DTA statement [27] (see Supplementary Table S1).

Search Methods
Using a comprehensive and sensitive search strategy, developed in close collaboration with our search specialist (R.S.), we searched five electronic databases: Medline (via OVID), EMBASE (via OVID), PubMed, Science Citation Index, and CENTRAL (The Cochrane Library). Our search strategy looked for words in the title/abstract or across the record and in the medical subject heading (MeSH). This strategy was initially run against the databases in August 2018, and it was updated in May and December 2019 and 2020. We limited our search to human subjects but applied no further restrictions based on either year or language (See Supplementary Table S2 for the Medline search strategy).
In addition, we manually scrutinized the reference lists of related systematic reviews and of the articles reporting all included studies to identify additional studies. We also contacted academic and industry partners within the LITMUS consortium to identify any potentially missed publications.

Index Biomarker
The index test for this review is FibroTest™ (Biopredictive Paris), also known as FibroSURE (LabCorp, Burlington, NC, USA), the brand name for FibroTest in the USA. This panel consists of serum α2-macroglobulin, apolipoprotein A1, haptoglobin, total bilirubin, and gamma-glutamyltranspeptidase (GGT), adjusted for age and gender. It provides a quantitative estimate of liver fibrosis, ranging from 0 to 1, with 0 referring to F0 and 0.74 to F4 [28].
ActiTest, another test from Biopredictive Paris, which includes the same components plus alanine-aminotransferase (ALT), is used for assessment of necro-inflammatory activity; this test, however, was not included in this systematic review [29].

Study Design and Participants
Potentially eligible were studies, reported in full-text articles or as conference abstracts, that had enrolled adult patients (≥18 years) with biopsy-proven or suspected NAFLD and reported FibroTest™ results (the index test) with paired liver histology data as a reference standard. We did not include studies that had recruited patients with other chronic liver conditions (e.g., viral hepatitis) or decompensated cirrhosis, and we only considered studies in patients with mixed etiologies if the performance of FibroTest™ in NAFLD patients was reported separately. We excluded studies without enough data to calculate diagnostic accuracy estimates from our meta-analysis.

Selection of Studies
We removed duplicate records of our initial search results. All remaining titles were screened by one reviewer (Y.V.), while a second reviewer independently screened 10% of the titles. The abstracts and full-text reports of all potentially eligible studies were independently screened by two authors (Y.V. and J.L.). A decision on inclusion was reached in discussions with the third reviewer (M.H.Z.) in case of any disagreement. The Rayyan software (https://rayyan.qcri.org, accessed from August 2018 to January 2021) was used in the screening phase of this study.

Data Extraction
The following data were extracted by one author (Y.V./J.L.) and cross-checked by the other author (J.L./Y.V.): study group characteristics, index test and reference test features, and number of true and false positives and true and false negatives for constructing classification tables. The corresponding authors of studies that did not report sufficient data for reconstructing contingency tables were contacted.

Assessment of Methodological Quality
We used the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) to assess the risk of bias and concerns about applicability in included studies [42]. The risk of bias of each of the four main domains of the review-specific QUADAS-2 tool was evaluated by two authors (Y.V. and J.L.) independently, and a judgment of 'low', 'high', or 'unclear' risk was assigned to each study.

Statistical Analysis
Classification tables were extracted or reconstructed for expressing the diagnostic accuracy of FibroTest™, for each pre-defined target condition. Study-specific estimates of sensitivity, specificity, and corresponding 95% confidence intervals (CI) were generated and graphically illustrated in forest plots. Accuracy data were extracted for reported thresholds. Variability was assessed based on visual assessment of these forest plots and ROC curves.
In this systematic review, we did not attempt to formally evaluate publication bias since funnel plot asymmetry, the usual test in reviews, cannot discriminate between publication bias and other sources of asymmetry in systematic reviews of test accuracy studies [43].
Since studies could report a single threshold or multiple threshold values for different target conditions, we applied two different meta-analytical methods, depending on the number of reported threshold values. The bivariate random-effects model (mada package in R) was applied to calculate summary estimates of sensitivity, specificity, and predictive values of studies that reported only one threshold value. Since this method is based on evaluation of a single threshold, calculation of the predictive values in alternative threshold values is not possible.
When sensitivity and specificity have been reported for multiple threshold values by a single study, we used a linear mixed-effects model (diagmeta package in R). This multiple thresholds model enabled us to express summary estimates of sensitivity and specificity at different cut-points as well as the calculation of the predictive values, depending on the prevalence of the target condition of interest [44,45]. The threshold values are calculated based on maximizing Youden's J statistic (also called Youden's index): the sum of sensitivity and specificity minus 1. In our calculations of predictive values, a range of prevalences was used, based on the observed prevalence in our included primary studies. We further evaluated FibroTest thresholds required to achieve pre-specified high values of sensitivity and specificity. For each of the meta-analyses, SROC curves were constructed to represent the overall diagnostic accuracy of the index test in detecting the corresponding target condition.
We calculated 95% confidence intervals and 95% prediction intervals around estimates, where appropriate. The confidence interval around the mean represents the range of values that are still plausible, given the included studies. A prediction interval also includes the between-study heterogeneity and refers to plausible values that a future single primary accuracy study of the same index test, for the same target condition, may generate.
A minimally acceptable performance level of 0.80 (for sensitivity, specificity, and AUC) was predefined for FibroTest™, which would exceed the performance of other NAFLDrelated fibrosis screening and diagnostic biomarkers [16]. R for Windows (Version 3.6.0; R Foundation for Statistical Computing, Vienna, Austria) was used in all analyses.
We investigated the influence of type of scoring method for fibrosis staging in a sensitivity analysis, by removing a study that used METAVIR criteria from meta-analysis for advanced fibrosis [26]. We also assessed the effect of the scoring method type on the meta-analysis for significant fibrosis, by comparing the results before and after adding the single study using the METAVIR system.

Search Results
After removing duplicates from the 9066 references found by the initial search, we screened 6220 titles, 778 abstracts, and 265 full-text reports. We found 18 studies after searching other sources. Fifteen potentially eligible studies reporting on the accuracy of FibroTest™ in NAFLD patients were evaluated in the first search, while we found five more eligible studies after updating the search. In total, ten studies met all of our inclusion criteria and were included in our systematic review, and seven of them could be included in our meta-analysis ( Figure 1). See Supplementary Table S5 for reasons of exclusions.

yses.
We investigated the influence of type of scoring method for fibrosis staging in a sensitivity analysis, by removing a study that used METAVIR criteria from meta-analysis for advanced fibrosis [26]. We also assessed the effect of the scoring method type on the metaanalysis for significant fibrosis, by comparing the results before and after adding the single study using the METAVIR system.

Search Results
After removing duplicates from the 9066 references found by the initial search, we screened 6220 titles, 778 abstracts, and 265 full-text reports. We found 18 studies after searching other sources. Fifteen potentially eligible studies reporting on the accuracy of FibroTest™ in NAFLD patients were evaluated in the first search, while we found five more eligible studies after updating the search. In total, ten studies met all of our inclusion criteria and were included in our systematic review, and seven of them could be included in our meta-analysis ( Figure 1). See Supplementary Table S5 for reasons of exclusions.

Study Characteristics
The characteristics of the ten studies that met our inclusion criteria are summarized in Table 1. The mean or median age of the participants included in these studies ranged from 42.2 to 57 years. Six studies had included NAFLD patients with a mean BMI ≥ 25, and one study had recruited morbidly obese participants (BMI ≥ 35) [26]. The percentage of diabetic patients was reported by seven studies, and it ranged from 23% to 100%. N: total number of patients, NR: not reported, DM: diabetes mellitus, ALT: alanine aminotransferase, ALT: aspartate aminotransferase. α Median. * Three studies were excluded from the meta-analysis due to lack of information on cut-offs and/or data for reconstructing 2-by-2 tables. ¥ Data on advanced fibrosis was also reported in this paper; however, the most recent data reported by the authors in the same population was selected for the meta-analysis.
The included studies differed in the histological scoring system for staging fibrosis in NAFLD patients. The majority of studies utilized the NASH CRN scoring system, while one study relied on METAVIR criteria [26]. The scores from this study were converted to their NASH CRN equivalent for the meta-analysis [53] (see Supplementary Table S6 for correspondence between the NASH CRN and the METAVIR systems).
Not all studies provided detailed information about the biopsy. All except one [49] reported the length of the biopsy samples, which ranged from 13.8 to 27 mm. None of the studies reported the size of the needle gauge. Biopsy samples had been evaluated by a single pathologist in five studies [25,46,48,51,52], and one study reported evaluations by a pathology consortium and centralized pathologists [26]. One other study did not report information regarding the histology assessment [49]. Only one study had relied on hepatopathologists for their histopathology assessments [48]. More details about the biopsy characteristics of each study are available in Supplementary Table S7.

Methodological Quality Assessment
The methodological quality assessment results are summarized in Supplementary Figures S1 and S2. Only one study was judged to be at low risk of bias in all four domains of the QUADAS-2 instrument [48]. Four studies were scored at a high risk of bias for the patient selection domain [25,26,46,49]. In three of them, healthy controls had been included [25,46,49]. The study reports did not specify whether the participants had been enrolled as a consecutive series, based on random sampling, or as a convenience sample [25,46,49]. In one other study, the analysis was performed on data collected in three validation studies, performed independently, with considerable heterogeneity in the respective study populations [26].
Three studies had included only diabetic patients or morbidly obese patients, qualifying for bariatric surgery. Due to these inclusion criteria, the respective study groups are not representative of the general population of suspected NAFLD patients. We therefore had applicability concerns in the patient selection domain for a number of studies [25,26,46,49,52].
Three studies did not report whether the threshold value for FibroTest™ was prespecified before data analysis. We scored the index test-related risk of bias for these studies as "unclear" [25,51,52]. One study failed to report whether the data evaluation process was blinded, and that study also did not report the time interval between the sample collection and biopsy [49]. We scored the risk of bias of this study as "unclear" for both the reference standard and the flow and timing domain.

Accuracy of FibroTest™ for Detecting Advanced Fibrosis (≥F3)
In total, five studies were included in the meta-analysis of the diagnostic accuracy of FibroTest™ in detecting advanced fibrosis. These studies had recruited 2103 NAFLD participants in total, of which 551 had advanced fibrosis. Three of the studies reported the performance of the test at a single threshold, and the other two studies reported accuracy data at multiple different thresholds (see Supplementary Figure S3   We used the multiple thresholds model to calculate expected positive and negative predictive values for desired levels of sensitivity and specificity, examining advanced fibrosis prevalences between 10% and 50%. The results are shown in Table 2.   We used the multiple thresholds model to calculate expected positive and negative predictive values for desired levels of sensitivity and specificity, examining advanced fibrosis prevalences between 10% and 50%. The results are shown in Table 2. By fixing sensitivity at high values (from 0.90 to 0.98), we observed corresponding specificity values that ranged from 0.42 to 0.13, at threshold values of 0.20 to 0.12. As expected, the projected positive predictive values (PPVs) were higher in settings with a higher disease prevalence, however, the estimated negative predictive values (NPVs) were acceptable (≥0.80) at all prevalences. On the other hand, setting specificity at values between 0.90 and 0.98 resulted in sensitivities between 0.37 and 0.09, at threshold values of 0.48 to 0.86, and acceptable PPVs (≥0.80) only in settings with disease prevalence of 50%. Figure 3 illustrates the corresponding PPV and NPV for different thresholds based on the multiple thresholds model using all available information for advanced fibrosis.
J. Clin. Med. 2021, 10, x FOR PEER REVIEW 10 of 18 50%. Figure 3 illustrates the corresponding PPV and NPV for different thresholds based on the multiple thresholds model using all available information for advanced fibrosis.

Accuracy of FibroTest™ for Detecting Significant Fibrosis (≥F2)
Five studies, reporting on 1788 NAFLD participants, of which 952 patients had significant fibrosis, reported the performance of FibroTest™ in detecting significant fibrosis. (see Supplementary Figure S4 for the forest plots at reported thresholds, ranging from 0.30 to 0.75). The resulting AUC was 0.77 in our meta-analysis. See Figure 4a for SROC curve and corresponding 95% CI and prediction region.

Accuracy of FibroTest™ for Detecting Significant Fibrosis (≥F2)
Five studies, reporting on 1788 NAFLD participants, of which 952 patients had significant fibrosis, reported the performance of FibroTest™ in detecting significant fibrosis. (see Supplementary Figure S4 for the forest plots at reported thresholds, ranging from 0.30 to 0.75). The resulting AUC was 0.77 in our meta-analysis. See Figure 4a for SROC curve and corresponding 95% CI and prediction region.

Accuracy of FibroTest™ in Detecting Cirrhosis (F4 vs. F0-F3)
Three studies (1370 participants, 177 with cirrhosis) reported on the accuracy of FibroTest™ in detecting cirrhosis. The proportion of study participants with cirrhosis ranged from 10% to 13%. The studies reported accuracy for different cut-offs: 0.57, 074, and 0.75. The estimate of the AUC in the meta-analysis was 0.92. See Supplementary Figure S5 for forest plots of these studies and Figure 4b for the ROC curve.

Accuracy of FibroTest™ in Detecting any Fibrosis (F1-4 vs. F0)
Three studies reported the performance of FibroTest™ for detecting any level of fibrosis in NAFLD patients. These studies had recruited 1583 participants, of which 1214 had some level of fibrosis. The cut-offs for accuracy estimates were not the same: 0.27 was used in two studies and 0.26 in one. See Supplementary Figure S6 for the forest plots. Our summary estimate of the AUC was 0.69. (see Figure 4c).

Sensitivity Analysis
We conducted sensitivity analyses to examine the impact of the different scoring systems for staging liver fibrosis on the meta-analytic findings. Removing the one study with METAVIR criteria from the meta-analysis for advanced fibrosis did not meaningfully affect the results (AUC: 0.77, sensitivity: 0.73, and specificity: 0.69).
We also did not observe any substantial differences when comparing the results of the meta-analysis for significant fibrosis with and without including this study. Including this study resulted in a slightly lower AUC (0.71 vs. 0.77) and small changes in the summary estimates of sensitivity (0.49 vs. 0.56) and specificity (0.82 vs. 0.77) at corresponding threshold values, projected by the multiple thresholds model. Each colored line represents a different prevalence setting, ranging from 10% to 50%.

Accuracy of FibroTest™ for Detecting Significant Fibrosis (≥F2)
Five studies, reporting on 1788 NAFLD participants, of which 952 patients had significant fibrosis, reported the performance of FibroTest™ in detecting significant fibrosis. (see Supplementary Figure S4 for the forest plots at reported thresholds, ranging from 0.30 to 0.75). The resulting AUC was 0.77 in our meta-analysis. See Figure 4a for SROC curve and corresponding 95% CI and prediction region.  , and any fibrosis level (c). In each plot, the circle (summary point) represents the summary estimate of sensitivity and specificity. Each triangle represents a single threshold value reported from an included study. The solid eclipse represents the 95% confidence region, and the dotted eclipse represents the prediction region. AUC: area under the receiver operating curve, 50%.

Accuracy of FibroTest™ in Detecting Cirrhosis (F4 vs. F0-F3)
Three studies (1370 participants, 177 with cirrhosis) reported on the accuracy of Fi-broTest™ in detecting cirrhosis. The proportion of study participants with cirrhosis ranged from 10% to 13%. The studies reported accuracy for different cut-offs: 0.57, 074, and 0.75. The estimate of the AUC in the meta-analysis was 0.92. See Supplementary Figure S5 for forest plots of these studies and Figure 4b for the ROC curve.

Accuracy of FibroTest™ in Detecting any Fibrosis (F1-4 vs. F0)
Three studies reported the performance of FibroTest™ for detecting any level of fibrosis in NAFLD patients. These studies had recruited 1583 participants, of which 1214 had some level of fibrosis. The cut-offs for accuracy estimates were not the same: 0.27 was used in two studies and 0.26 in one. See Supplementary Figure S6 for the forest plots. Our summary estimate of the AUC was 0.69. (see Figure 4c).

Sensitivity Analysis
We conducted sensitivity analyses to examine the impact of the different scoring systems for staging liver fibrosis on the meta-analytic findings. Removing the one study with METAVIR criteria from the meta-analysis for advanced fibrosis did not meaningfully affect the results (AUC: 0.77, sensitivity: 0.73, and specificity: 0.69).
We also did not observe any substantial differences when comparing the results of , and any fibrosis level (c). In each plot, the circle (summary point) represents the summary estimate of sensitivity and specificity. Each triangle represents a single threshold value reported from an included study. The solid eclipse represents the 95% confidence region, and the dotted eclipse represents the prediction region. AUC: area under the receiver operating curve, 50%.

Summary of Main Findings
In this systematic review, summarizing the evidence from seven studies, FibroTest™ did not meet the minimally acceptable performance level in detecting significant (≥F2), advanced (≥F3), or any fibrosis (AUC: 0.77, 0.77, and 0.69, respectively). In comparison, the diagnostic accuracy of the test in detecting cirrhosis (F4) was more promising, demonstrating an AUC of 0.92.
In meta-analysis of advanced fibrosis, where the studies reported different thresholds, we could use the multiple thresholds model to calculate negative and positive predictive values for a range of prevalences of the disease, optimizing predefined sensitivity and specificity values. This analysis showed that by optimizing sensitivity to values above 0.90, the test could result in high NPVs (>90%) in settings with low prevalence of disease, such as primary and secondary care settings, but with relatively low PPVs (11-61%).

Strengths and Limitations of the Review
FibroTest™ is a continuous linear biochemical assessment of NAFLD progression, providing a quantitative estimate of liver fibrosis that is usually interpreted relative to the METAVIR scoring system (F0 to F4) [29] (Supplementary Table S8) [28]. In our metaanalysis, only one of the seven included studies had used METAVIR, and the others used the NASH CRN scoring system. To incorporate all available data, we converted the METAVIR scores into the corresponding NASH CRN equivalent (Supplementary Table S6). After the conversion, one primary study moved from the significant fibrosis group to the advanced fibrosis group, changing the target condition from what was originally reported in the paper.
It should be noted that our meta-analysis results are based on test accuracy data reported by primary studies conducted in settings with a disease prevalence that exceeds that in most primary care settings. The limited number of studies available for metaanalysis impeded all subgroup analyses or formal explorations of sources of heterogeneity, including those related to prevalence, age, sex, and comorbidity. Further studies with sufficient individual patient data are required for conducting subgroup analysis and drawing valid conclusions about differences in performance across identifiable subgroups of NAFLD patients. Moreover, all studies included in our meta-analysis were performed in western countries. This may limit the generalizability of our findings and indicates a need for further evaluations of test performance in different ethnicities.
Although a list of recommended cut-offs was published for detecting different levels of fibrosis (Supplementary Table S8), studies were not consistent in using the same thresholds for each target condition, which is another factor related to variability in reported performance measures. We used the novel multiple thresholds meta-analysis model, which enabled us to use all available data and reduce the risk of an optimistic evaluation of the biomarker.
Information about the histological procedure, such as size of the needle gauge, the length of the biopsy, and number of portal tracts, were often not reported. However, these factors affect the reliability of the reference standard.

Other Published Systematic Reviews
Since FibroTest™ was originally developed for the assessment of fibrosis in HCV patients, most of the available accuracy studies were performed in patients with viral hepatitis. A limited number of studies evaluated the performance of the test in NAFLD patients. A recently published systematic review, with a focus on the obese population (BMI over 30), assessed the performance of a number of non-invasive tests, including FibroTest™. It reported higher accuracy estimates than those obtained in our review. In their systematic review, they separately pooled and reported results of studies that used low and high thresholds of FibroTest™. Their meta-analysis of two studies reporting the performance of the test in detecting significant fibrosis using low thresholds, resulted in pooled sensitivity and specificity of 0.67 (95% CI: 0.59 to 0.74) and 0.75 (95% CI: 0.70 to 0.80), respectively. While for the same target condition but in the high thresholds, the meta-analysis resulted in sensitivity and specificity of 0.13 (95% CI: 0.07 to 0.22) and 0.99 (95% CI: 0.98 to 0.99), respectively. For advanced fibrosis, the study reported slightly higher sensitivity for both meta-analysis of low and high thresholds, 0.83 (95% CI: 0.77 to 0.88) and 0.46 (95% CI: 0.35 to 0.57), and lower specificity, 0.63 (95% CI: 95% CI: 0.59 to 0.67) and 0.94 (95% CI: 0.92 to 0.96), respectively. [6] By comparing the test with other single biomarkers, the authors suggested that the complex panels, including FibroTest™, can perform more accurately. Unfortunately, this review had included only a small number of studies and did not consider the variability in histological scoring systems in the included primary studies.
Two other systematic reviews have reported slightly higher accuracy levels for Fi-broTest™ in detecting significant fibrosis: an AUC of 0.78 (95% CI: 0.72 to 0.85) and 0.84 (95% CI: 0.76 to 0.92) [54,55]. These reviews were based on the results of two and one primary study only, respectively.

Implications
FibroTest™ is a panel of markers recommended by the WHO [56], the American Association for the Study of Liver Diseases (AASLD) [57,58], the European Association for the Study of the Liver (EASL) [12], and the Asia-Pacific Association for the Study of the Liver (APASL) [59] for evaluating hepatic fibrosis in patients with viral hepatitis. It was also shown to have high predictive values for significant lesions in ALD patients [24]. Due to the similarity of fibrosis features between ALD and NAFLD patients, it was then proposed for evaluating fibrosis levels in NAFLD patients [25].
FibroTest™ is currently available in many countries and usually used in combination with other blood tests, including SteatoTest for steatosis grading and ActiTest for inflammation activity grading. The EASL clinical practice guideline (2016) recommends that surrogate non-invasive markers of fibrosis, such as NAFLD Fibrosis Score (NFS), Enhanced Liver Fibrosis (ELF), and FibroTest™, which have acceptable diagnostic accuracy (AUC > 0.80), should be used in NAFLD patients, to rule out significant fibrosis [16]. This recommendation was based on only two studies that evaluated FibroTest™. In our systematic review, based on five studies, including more recent ones, we observed a lower AUC (0.77). Our meta-analysis results showed that FibroTest™ has acceptable diagnostic performance only in detecting cirrhosis in NAFLD patients (AUC: 0.92).
Other recommended non-invasive markers in NAFLD patients have also been documented in the literature, with variable performance levels in detecting degrees of fibrosis. For instance, ELF showed acceptable accuracy (≥0.80) in detecting significant and advanced fibrosis [60]. However, like FibroTest™, better diagnostic performance of the test at higher thresholds and in high-prevalence settings suggests careful consideration of the likely disease prevalence in the intended use setting and adoption of suitable test thresholds to achieve the desired test performance.
In addition to serum-based markers, other tests, such as those based on elastography, have been described in the literature, often with promising diagnostic performance [6]. A recently published comparative study evaluated the most validated fibrosis tests, including Fibrosis-4 (FIB-4), NFS, FibroTest™, and Fibroscan. The findings of the study showed significantly better performance of Fibroscan in detecting NAFLD-related advanced fibrosis compared to all other evaluated blood tests [48]. This indicates that, when available, elastography-based tests can be useful as first-line procedures as they give an immediate result after a quick and easy-to-perform examination [48]. Yet, difficulties in performing these tests in obese patients, limitations in distinguishing between steatosis and steatohepatitis, and a lack of sufficient paired studies make comparisons with other markers difficult.
A few accuracy studies have evaluated the performance of FibroTest™ in comparison to other blood tests in detecting different stages of fibrosis [46,51,52]. One study showed that more complex models, such as Hepascore, FibroTest™, and FIB-4, can identify advanced fibrosis in NAFLD patients significantly better than simple models, such as the Platelet Ratio Index (APRI) [51]. Yet another comparison showed that none of the commonly used approaches, including FibroTest™, FIB4, APRI, and NFS, performed significantly better than plasma aspartate aminotransferase (AST) in detecting diabetic NAFLD patients with advanced fibrosis [46]. These inconsistent conclusions highlight the need for further well-designed comparisons in the intended use population. More comparative accuracy studies of high methodological quality are necessary for a valid appraisal of the performance of FibroTest™ relative to other non-invasive markers.

Conclusions
Our meta-analysis of the available evidence showed acceptable diagnostic performance (AUC > 0.80) of FibroTest™ only in detecting cirrhosis. EASL-EASD-EASO Clinical Practice Guidelines recommended FibroTest™ as a non-invasive test with acceptable diagnostic accuracy for detecting fibrosis and NAFLD progression [16]. In primary, secondary, and tertiary settings, with a 10-50% disease prevalence, FibroTest™ can have a high NPV, based on sensitives between 0.90 and 0.98, demonstrating its ability to rule out advanced fibrosis in NAFLD patients. However, clinicians should notice the low specificity at the corresponding thresholds, leading to a considerable number of false positive results, potentially resulting in invasive and expensive follow-up evaluations, such as liver biopsy. Since these were projections, further studies are needed, conducted in primary care settings.   Data Availability Statement: The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.