Accuracy of Noninvasive Diagnostic Tests for the Detection of Significant and Advanced Fibrosis Stages in Nonalcoholic Fatty Liver Disease: A Systematic Literature Review of the US Studies

Background: The purpose of this systematic literature review (SLR) was to evaluate the accuracy of noninvasive diagnostic tools in detecting significant or advanced (F2/F3) fibrosis among patients with nonalcoholic fatty liver (NAFL) in the US healthcare context. Methods: The SLR was conducted in PubMed and Web of Science, with an additional hand search of public domains and citations, in line with the PRISMA statement. The study included US-based original research on diagnostic test sensitivity, specificity and accuracy. Results: Twenty studies were included in qualitative evidence synthesis. Imaging techniques with the highest diagnostic accuracy in F2/F3 detection and differentiation were magnetic resonance elastography and vibration-controlled transient elastography. The most promising standard blood biomarkers were NAFLD fibrosis score and FIB-4. The novel diagnostic tools showed good overall accuracy, particularly a score composed of body mass index, GGT, 25-OH-vitamin D, and platelet count. The novel approaches in liver fibrosis detection successfully combine imaging techniques and blood biomarkers. Conclusions: While noninvasive techniques could overcome some limitations of liver biopsy, a tool that would provide a sufficiently sensitive and reliable estimate of changes in fibrosis development and regression is still missing.


Introduction
Nonalcoholic fatty liver disease (NAFLD) is the most common chronic liver disease worldwide, affecting around a quarter of the general population and a third of the United States (US) population [1][2][3].
Over time, NAFLD may progress to nonalcoholic steatohepatitis (NASH), which is considered a more progressive form of the disease. NASH is histologically defined as hepatic steatosis, inflammation, and ballooning (enlarged cells with rarefied cytoplasm [4]) with or without fibrosis, caused by lipotoxicity of accumulated lipids in hepatocytes and immune cell activation [5,6]. It is estimated that NASH affects up to 6.5% of the general population worldwide and 3-4% of the US population [2,7]. The diagnosis is more common among obese and diabetic patients, occurring in around 30% and 65% of cases, respectively [8].
One of the most common complications of NAFLD and NASH is liver fibrosis, occurring in more than a third of NASH patients over a 5-year period [9]. The level of liver fibrosis in NAFLD is commonly scored using the NASH CRN system, where fibrosis stage 0 represents no fibrosis; stage 1 demarks pericellular fibrosis; stage 2 denotes centrilobular and periportal fibrosis; stage 3 is bridging fibrosis, and stage 4 represents cirrhosis [10]. Fibrosis stage F2 or higher (F2+) is considered significant fibrosis. Advanced fibrosis traditionally refers to stage F3 or higher (F3+) [11].
About 8% of the general population and 13% of the high-risk population are assumed to have undetected advanced fibrosis [12][13][14]. A recent population analysis of data from National Health and Nutrition Examination Survey (NHANES) established that around 7.5% of NAFLD patients had advanced fibrosis [15]. Liver fibrosis is the most important prognostic factor in the course of NAFLD, as it is the only pathological finding that correlates with hepatic decompensation events and liver-related mortality [16,17]. Early and accurate diagnosis and staging of fibrosis in NAFLD and NASH patients, particularly those with significant and advanced fibrosis (F2 and F3 stages), is necessary to determine the patient's prognosis and guide clinical decision-making [11,18].
Focusing on universal access to a consistently accurate and minimally invasive diagnosis of patients with significant or advanced fibrosis would ensure appropriate disease management and a better prognosis. Studies summarizing evidence regarding noninvasive diagnostic tests' ability to accurately detect F2+ and F3+ fibrosis stages in NAFLD and NASH patients are lacking within the US-based studies.
The objective of the current systematic literature review (SLR) was to collect, summarize, and interpret published evidence from US studies on the accuracy of currently available diagnostic tests in detecting and longitudinally monitoring F2+/F3+ fibrosis stages in NAFLD and NASH patients. As the prevalence and natural course of NAFLD vary across the continents, races, and ethnic groups, the study focused on original research articles conducted in the US, assuming a similar demographic distribution across the included studies and enhancing the comparability of their results. The availability of different imaging techniques and biomarkers also varies across different countries and the study targeted currently existing noninvasive diagnostic tests available in the US.

Data Sources and Selection Criteria
The key literature databases for the SLR were the Medical Literature Analysis and Retrieval System Online (MEDLINE ® ), assessed via PubMed and Web of Science (January 2016 through May 2022). As all the outcomes were considered time-sensitive and sensitive to the evolving methodological approaches, a 6-year time constraint was applied to provide a current update of existing literature reviews. In addition, a hand search was performed across publicly available domains (e.g., Google Scholar) and reference lists to ensure all relevant studies were included. Only primary original research studies were considered, while SLRs, meta-analyses, narrative reviews, and guidelines were excluded. Selection criteria are shown in Table 1.

Search Strategy
The search query (Table S1) was constructed to address diagnostic tool accuracy in detection and tracking F2+ and F3+ fibrosis stages following population, intervention, comparators, and outcomes (PICO) criteria (Table 2).

PICO Inclusion Criteria Exclusion Criteria
The detailed search strategy and yielded hits are presented in the Supplementary Materials (Table S1). The population included patients with NAFLD and/or NASH, also considering the underlying stages of the disease, i.e., nonalcoholic steatosis, fibrosis, and cirrhosis. All diagnostic tests, imaging techniques, and biomarkers were considered to explore the accuracy of diagnostic tools in terms of sensitivity, specificity, positive and negative predictive values (PPV and NPV), and area under the receiver operating characteristic (AUROC) curve.

Data Review and Extraction
The review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. Two independent reviewers performed the database search, abstract and title search, and full-text screening. A third reviewer resolved any disagreements. Predefined extraction tables were used for data collection and evidence summary.

Results
The study selection process is shown in the PRISMA flow diagram (Figure 1). After excluding duplicates, 1633 studies were title and abstract screened, 415 publications were full-text screened, and 20 studies were selected for data extraction.
The SLR resulted in 20 original US-based studies that evaluated a wide spectrum of noninvasive imaging modalities and biomarkers for the detection of F2+ and F3+ stage fibrosis. The characteristics of included studies are presented in Table 3 while the full list of diagnostic techniques is presented in Figure 2. The SLR resulted in 20 original US-based studies that evaluated a wide spectrum of noninvasive imaging modalities and biomarkers for the detection of F2+ and F3+ stage fibrosis. The characteristics of included studies are presented in Table 3 while the full list of diagnostic techniques is presented in Figure 2.

Imaging Techniques
The SLR identified eight studies that explored the diagnostic accuracy and capabilities of imaging techniques in terms of significant (F2+ fibrosis stages) and advanced (F3+ fibrosis stages) fibrosis detection ( Table 3). The diagnostic accuracy of imaging techniques in detection of significant and advanced fibrosis is presented in Tables 4 and 5.

Imaging Techniques
The SLR identified eight studies that explored the diagnostic accuracy and capabilities of imaging techniques in terms of significant (F2+ fibrosis stages) and advanced (F3+ fibrosis stages) fibrosis detection ( Table 3). The diagnostic accuracy of imaging techniques in detection of significant and advanced fibrosis is presented in Tables 4 and 5.   Abbreviations: PPV-positive predictive value; NPV-negative predictive value; AUROC-area under the receiver operating characteristic curve. 1 Reported results are obtained from the training cohort. 2 Reported results are obtained from the validation cohort. 3 Diagnostic accuracy at Center 1. 4 Diagnostic accuracy at Center 2. 5 Diagnostic accuracy of automated liver stiffness analysis.

Vibration-Controlled Transient Elastography (VCTE)
In a randomized clinical trial that tested baseline performance of different noninvasive diagnostic techniques, Harrison et al. reported low predictive value of VCTE in differentiating fibrosis stages in NASH patients with biopsy-proven F1-F3 fibrosis (AUROC 0.630 for F2+, 0.650 for F3+) [20]. On the other hand, in a prospective study of liver transplant recipients, Siddiqui et al. found that VCTE detects significant and advanced fibrosis with reliable accuracy. By fixing the sensitivity at 90%, the authors demonstrated its potential to be used as a rule-out tool for significant fibrosis in case of negative results (cutoff 7.4 kPa) among post-liver transplantation patients. It was shown that VCTE with cutoff value of 10.5 kPa was a better ruling-out technique for advanced fibrosis than for significant fibrosis (AUROC 0.940 vs. 0.870, respectively). Still, when the specificity was fixed at 90%, the method yielded a low PPV for both significant and advanced fibrosis (67% and 64%), implying that the tool cannot be reliably used for ruling in the higher-grade fibrosis stages in diagnostic practice [23]. Similarly, a single-center retrospective cohort study by Trowel et al. reported reliable accuracy and rule-out potential of VCTE in advanced fibrosis [25].

Shear Wave Elastography (SWE)
In a retrospective analysis by Ozturk et al., SWE with a cutoff value of 8.4 kPa performed well in detecting significant and advanced fibrosis in patients with suspected or diagnosed NAFLD, concluding that SWE may be useful in detecting the patients at risk of liver morbidity and mortality [22]. Zhang et al. reported better diagnostic performance of SWE in detecting significant and advanced fibrosis among patients with diagnosed or suspected NAFLD. This cross-sectional study pointed out the potential of using SWE as a rule-out diagnostic tool for F2+ and F3+ fibrosis stages (cutoff 1.49 m/s, and 1.46 m/s, respectively). However, using cutoffs with specificity ≥ 90% for significant and advanced fibrosis incorrectly classified approximately every second patient with a positive test result (58.8% and 55.6% PPV, respectively) due to the small prevalence of the condition in the tested sample [26].

Magnetic Resonance Elastography (MRE)
Zhang et al. showed that MRE at a 2.77 kPa cutoff value (for both fibrosis stages) could be used as an ideal rule-out diagnostic tool, while at a fixed specificity of ≥90%, it performed significantly better than SWE in accurately detecting patients with fibrosis F2+ and F3+ stages. Still, the study concluded that neither of the two techniques performed well enough to replace biopsy in detecting significant and advanced fibrosis [26]. In a retrospective analysis of patients with suspected or diagnosed NAFLD from two medical centers, Tang et al. reported almost perfect diagnostic accuracy of MRE in detecting advanced fibrosis (cutoffs of 3.6 kPa and 3.65 kPa). Based on these results, the MRE correctly detected the absence of F3+ fibrosis in nearly all patients, with lower but acceptable rule-in potential [24].
Jayakumar et al. reported higher accuracy of MRE in tracking fibrosis improvement than progression among patients diagnosed with F2 and F3 fibrosis stages at ≥0% reduction cutoff (AUROC 0.790). The tool performed better in detecting improvement among F3 patients than F2 patients, with a high difference in specificity (86% and 33%, respectively) but similar sensitivity (69% and 60%, respectively). The accuracy of detecting F2 and F3 fibrosis progression was modest at ≥0% improvement cutoff (AUROC 0.570), but a high NPV value (88%) implies the method can still be used to rule out the fibrosis progression [21].

Magnetic Resonance Imaging-Derived Liver Surface Nodularity (MRI-Derived LSN) Score
A single-arm prospective study performed by Catania et al. showed that MRI-derived LSN score was a reliable tool in detecting significant and advanced fibrosis among patients with NAFLD (AUROCs 0.800 for F2+, 0.860 for F3+) [19].
Diagnostic accuracy of imaging techniques is presented in Table 4 for significant fibrosis and Table 5 for advanced fibrosis.

Established Fibrosis Scores and Biomarkers
Our SLR identified 11 studies that reported diagnostic accuracy of established general scores and biomarkers in detecting significant and advanced fibrosis (Table 3). Tables 6 and 7 summarize the diagnostic performance of the established scores and biomarkers in detecting significant and advanced fibrosis.   Abbreviations: NAFLD-nonalcoholic fatty liver disease; PPV-positive predictive value; NPV-negative predictive value; AUROC-area under the receiver operating characteristic curve; AST-aspartate transaminase; ALT-alanine transaminase; BARD score-BMI. AST, ALT, and diabetes mellitus presence; GGT-gammaglutamyl transferase; BMI-body-mass index; 1 biopsy was used as a reference tool; 2 vibration-controlled transient elastography was used as a reference tool.

NAFLD Fibrosis Score (NFS)
In general, the low diagnostic accuracy of NFS was reported in the included studies with AUROC values ranging from 0.600 to 0.640 for the detection of significant fibrosis [20,30]. Harrison et al. reported a relatively low sensitivity (66%) and specificity (52%) of NFS in significant fibrosis detection among NASH patients at a cutoff value of 0.9. Corey et al. reported an 85% specificity rate with a 67% PPV for NFS [30]. As for advanced fibrosis,

AST to Platelet Ratio Index (APRI)
APRI showed a low diagnostic accuracy for the detection of significant fibrosis with a general AUROC of 0.660 with a cutoff value > 0.42 [33]. In contrast, for the detection of advanced fibrosis, APRI yielded good diagnostic accuracy (AUROC ranged from 0.680-0.860), with high specificity (75-99%), indicating that it may be a reliable tool to rule in patients with advanced fibrosis [27,28,32,34,35].

BARD Score
Balakrishnan et al. reported moderate diagnostic accuracy of the BARD score in detecting advanced fibrosis among predominantly Hispanic NAFLD patients, with a high sensitivity rate, which indicated the BARD score would be reliable for ruling out advanced fibrosis [27].

Enhanced Liver Fibrosis (ELF) Test
In the study by Harrison et al., the ELF test demonstrated satisfying accuracy for detecting significant (cutoff −0.2) and advanced fibrosis (cutoff −0.1) among NASH patients [20]. Younossi et al. reported very high specificity of the ELF test and good reliability in ruling in patients with advanced fibrosis (cutoffs 9.8 and 11.3) among NAFLD patients with biopsy and VCTE as reference tools [36].

FibroTest
The FibroTest has demonstrated modest accuracy in the detection of advanced fibrosis among patients with NAFLD [28]. At cutoff values < 0.3 and >0.7, the FibroTest demonstrated very high specificity, showing its potential for ruling in patients with advanced fibrosis. The main limitation of the FibroTest is the results between 0.3 and 0.7 would be unclassified [28].

Gamma-Glutamyl Transferase (GGT) Levels
Harrison et al. reported low diagnostic accuracy of serum GGT levels in detecting significant and advanced fibrosis among NASH patients [20]. Kulkarni et al. demonstrated slightly better accuracy in detecting significant fibrosis in terms of sensitivity and specificity among NAFLD patients [31], but in general, GGT was marked as a biomarker with low diagnostic accuracy for fibrosis detection.
3.2.8. Aspartate Aminotransferase/Alanine Aminotransferase Ratio (AST/ALT Ratio) Nielsen et al. assessed the diagnostic accuracy of the AST/ALT ratio for detecting significant and advanced fibrosis among patients with NASH and liver fibrosis [33]. The study denoted very high sensitivity (90%) with a cutoff value > 0.56 and suggested this biomarker could be reliable for ruling out significant fibrosis [33]. In contrast, the same study showed a low diagnostic accuracy of AST/ALT ratio in advanced fibrosis detection with a cutoff value > 0.78 [33] [20]. On the other hand, Bril et al. reported very good accuracy of plasma AST levels in detecting advanced fibrosis among T2DM patients (AUROC of 0.850) at cutoff points of 40 U/L and 38 U/L [28].
Diagnostic accuracy of established fibrosis scores and biomarkers is presented in Table 6 for significant fibrosis and Table 7 for advanced fibrosis.

Procollagen Type-III N-Terminal Peptide (PRO-C3)
Nielsen et al. reported that plasma PRO-C3 has satisfying diagnostic accuracy (AUROC = 0.700) for the detection of significant fibrosis among NASH patients with high specificity (86%) and rule-in potential [33]. For advanced fibrosis, AUROC of 0.730 was reported [33]. The cross-sectional study by Bril et al. identified PRO-C3 as one of the most reliable biomarkers for advanced fibrosis detection with a specificity of 96% and 0.900 AUROC at 20 ng/mL cutoff point [28].

Monocyte Chemoattractant Protein 1 (MCP-1)
MCP-1 demonstrated modest diagnostic accuracy for the detection of significant and advanced fibrosis among NASH patients (AUROC of 0.520 and 0.510, respectively) [20]. In contrast, the high specificity of the MCP-1 biomarker reported in the same study indicated it could be reliable to rule in patients with significant advanced fibrosis (87% and 93%, respectively).

NAFLD Fibrosis Protein Panel (NFPP) and a Disintegrin and Metalloproteinase with Thrombospondin Motifs like 2 (ADAMTSL2)
A retrospective study reported on two novel biomarkers for the detection of significant fibrosis among NAFLD patients, ADAMTSL2, and a combination of 8 sensitive proteins-NFPP [30]. Both biomarkers showed high diagnostic accuracy with AUROC of 0.830 for the detection of significant fibrosis. Additionally, the combination of NFPP with general clinical features (age, BMI, sex, and diabetes status), or with FIB-4 index or NFS improved the diagnostic accuracy of NFPP (AUROC 0.870) [30].

Kulkarni Model
A large 10-year retrospective study of pediatric patients who underwent liver biopsy identified the strongest predictors of significant liver fibrosis. The model included body mass index, vitamin D, platelet count, and GGT and resulted in a very good predicting ability with sensitivity and specificity of more than 80% and AUROC of 0.944 [31].
3.3.6. ADAPT Score ADAPT score, based on the PRO-C3 levels, T2DM, platelet count, and age demonstrated satisfying accuracy in the detection of significant and advanced fibrosis among patients with definite NASH and liver fibrosis (AUROC 0.760 for F2+, 0.800 for F3+) [33].

MEFIB Index
MEFIB index was determined using MRE with a cutoff value ≥ 3.3 kPa and FIB-4 index with a cutoff value ≥ 1.6 and provided a very high accuracy level for the detection of significant fibrosis [37]. Almost perfect specificity suggested this tool would be reliable to rule in NAFLD patients with significant fibrosis.

FAST Score
FAST score combines liver stiffness measurement (LSM) and controlled attenuation parameter measured by VCTE (e.g., FibroScan) and serum levels of AST [38]. Overall, good accuracy of the FAST score was demonstrated in detecting definite NASH (NAFLD activity score ≥ 4 and significant fibrosis) among patients with NAFLD. A FAST score with lower cutoff values (0.35 and 0.38) demonstrated good ability to rule out F2+ [38].

Cohort-Specific Model and Combination of 6 Biomarkers
The model included serum CK-18, fasting insulin, platelet count, sex, and HbA1c demonstrated good performance with an AUROC of 0.860 for advanced fibrosis detection among patients with T2DM [28]. A combination of 6 noninvasive tools (PRO-C3, APRI, AST, FIB-4 index, FibroTest, and NFS) showed very reliable performance with an AUROC of 0.910 in detecting advanced fibrosis among NAFLD patients with T2DM [28].

Prognostic Factor Model
The model combining alkaline phosphatase, HbA1c, platelet count, and international normalized ratio performed well in detecting advanced fibrosis among NAFLD patients. However, high sensitivity and lower specificity indicated this noninvasive panel would be only reliable in ruling out patients with advanced fibrosis [29].
Accuracy of novel diagnostic tools is presented in Table 8 for significant fibrosis and Table 9 for advanced fibrosis.

Discussion
This SLR provides a comprehensive current overview of diagnostic tools for detection and monitoring of NASH-related liver fibrosis staging based on the summarized evidence from the US studies. The diagnostic accuracy was validated against liver biopsy as a standard diagnostic tool in all studies, except for the Caussy et al. study where both liver biopsy and MRE were used [29], and that of Younossi et al. where VCTE and biopsy were reference tools [36].
There is a remarkable shift in the diagnostic pathways from biopsy as the reference standard to novel, less invasive techniques, imaging methods, and blood biomarkers. Still, the collected evidence implies there is no perfect noninvasive tool capable of capturing and tracking all the aspects of the complex pathological process resulting in fatty liver, liver fibrosis, and cirrhosis. Liver biopsy often remains necessary in particular for clinical trials.
VCTE (FibroScan ® ) is a noninvasive ultrasound-based imaging method that measures the speed of passage of acoustic shear waves through the liver tissue to estimate liver stiffness. Our SLR provides collected evidence on good overall accuracy of VCTE. A prospective study conducted on liver transplant recipients by Siddiqui et al. reported high accuracy of LSM in the detection of significant fibrosis (AUROC of 0.870) and advanced fibrosis (AUROC of 0.940). Still, the PPV values lower than 60% indicate that the tool should be used carefully when ruling in the conditions [23]. Similar conclusions about the lower rule-in potential of VCTE were shown in a retrospective study by Trowell et al. [25]. Harrison et al. demonstrated a lower performance of VCTE in the detection of significant and advanced fibrosis among NASH patients (0.630 and 0.650, respectively) [20].
Another ultrasound-based imaging technique that demonstrated good accuracy with AUROC ranging from 0.730 to 0.850 was SWE for detecting significant and advanced fibrosis [22,26]. In the retrospective study conducted by Ozturk et al., SWE demonstrated good accuracy in a small sample of NAFLD patients with very advanced liver fibrosis [22]. Slightly better diagnostic accuracy of SWE in the detection of significant and advanced fibrosis was reported by Zhang et al. in their cross-sectional study conducted in the sample of 100 NAFLD patients (AUROC of 0.810 and 0.850, respectively) [26].
A promising imaging diagnostic method is MRE, which computes transversal images of liver tissue to capture the propagation of shear waves through the tissue. Results from our SLR are in correlation with the previously published data. A retrospective study conducted by Tang et al. reported very high diagnostic capabilities for the MRE technique in the detection of advanced fibrosis among NAFLD patients (AUROC of 0.939-0.947 at 3.6-3.65 kPa cutoffs). However, this study was conducted on a small sample of patients (19 patients) and the results should be further validated [39]. Zhang et al. compared diagnostic accuracy of MRE with SWE. Although MRE performed better in differentiating lower stages of fibrosis (F1+ and F2+), there was no difference between the tools in detecting F3+ fibrosis [40]. MRE was the only diagnostic tool captured in the SLR that tracked fibrosis improvement or progression in patients with F2 or F3 fibrosis stage. Even though the study had a small sample size and reported low sensitivity and specificity after 24 weeks from baseline fibrosis measurement, the authors concluded that MRE-LSM could potentially replace biopsy in evaluating longitudinal fibrosis changes [21].
The MRI-derived LSN score showed the lowest diagnostic accuracy for detecting significant and advanced fibrosis among all imaging techniques captured in the SLR when comparing the reported AUROCs. Despite a very high correlation between LSN score and level of fibrosis in overweight and obese patients with biopsy-proven NAFLD, AUROC values of 0.800 and 0.860 were the lowest compared to other imaging techniques [19].
Serum levels of liver enzymes AST and ALT are widely used blood biomarkers for the diagnosis of multiple conditions. Due to their low cost and wide availability, they are used to represent a starting point in fatty liver disease assessment [28]. However, the accuracy of AST and ALT levels in predicting significant and advanced fibrosis may be affected by other hepatic co-morbidities, patients' characteristics, and associated conditions [41]. Accordingly, Harrison et al. demonstrated modest diagnostic accuracy among the population of patients diagnosed with NASH [20]. In contrast, Bril et al. suggest liver enzymes may remain the main diagnostic biomarker of advanced fibrosis in patients with T2DM due to their availability and high accuracy in excluding advanced cirrhosis, particularly in comparison with more costly and more complicated diagnostic options that turn out to perform equally well as AST/ALT levels in this population [28]. The authors recommended a sequential approach incorporating AST followed by another noninvasive tool for detecting advanced liver fibrosis, suggesting this approach would help avoid unnecessary liver biopsies [28].
Simple non-proprietary clinical scores (NFS, FIB-4, APRI) are cost-effective and sensitive enough to rule out the disease at lower thresholds. Still, they are not accurate enough to confirm the diagnosis of advanced fibrosis. BARD score and GGT were inferior in the detection of advanced fibrosis and showed modest accuracy with an AUROC in a range of 0.620-0.760 [20,27]. The Balakrishnan et al. study concluded that all investigated scores (NFS, APRI, BARD, FIB-4) have a moderate discriminatory ability for advanced fibrosis with AUROCs 0.700-0.790 [27] in predominantly Hispanic NAFLD patients. Similarly, the findings from Marella et al.'s study implied that noninvasive scores may be unreliable in the African American population and should be tested in larger multicenter studies [32]. Another study performed among T2DM patients with obesity also reported the modest accuracy of the scores in this population, with only FIB-4 showing a trend toward better accuracy [34]. Additionally, in a large observational study with more than two thousand patients, Udelsman et al. demonstrated low specificity of all noninvasive scoring systems in patients undergoing bariatric surgery [35]. Thus, although generally good accuracies imply the scoring systems can be reliable tools for the detection of significant and advanced fibrosis in everyday clinical practice, the modest performance in high-risk patients imposes the need for a more reliable screening assessment.
More expensive techniques that evaluate direct fibrosis markers (i.e., fibrosis markers in extracellular matrix components), such as the ELF test, CK-18 fragments, and PRO-C3, demonstrated higher sensitivity in detecting significant and advanced fibrosis. Based on our studies, these tests often cannot accurately differentiate progression or regression in diagnosed patients, and predominantly demonstrated modest diagnostic accuracy (AUROC < 0.8). Harrison et al. reported that despite suboptimal performance of noninvasive biomarkers in general, ELF demonstrated somewhat better diagnostic ability in fibrosis detection [20]. A large retrospective cohort analysis by Younossi et al. emphasized that ELF may be a very valuable tool for advanced fibrosis detection with high NPV and PPV but using multiple cohort-specific cutoff values that need to be validated before the use in clinical practice [36].
The FibroTest demonstrated modest diagnostic accuracy in the detection of advanced fibrosis among T2DM patients [28]. The ADAPT score showed more promising results in the detection of significant and advanced fibrosis within the sample of patients with NASH and liver fibrosis [33]. The regression model developed by Kulkarni et al. demonstrated very high accuracy in the detection of significant fibrosis among NAFLD patients [31]. Still, the retrospective nature of the data and the lack of prospective validation prevent us from concluding about the potential utility of the combined biomarkers.
In recent years, several specific metabolomic profiles have been associated with different stages of disease in NAFLD patients, making them a good target for future research in NAFLD diagnostics. Further studies revealed that changes in levels of these metabolites additionally could reflect specific pathways of liver injury related to NASH or advanced fibrosis, making them compelling diagnostic biomarkers. Therefore, it has been suggested that a combination of blood metabolites could be a highly accurate diagnostic test for the detection of advanced fibrosis [29]. Furthermore, many researchers evaluated the diagnostic potential of different combinations of biomarkers and diagnostic scores for fibrosis detection to achieve greater accuracy and better prediction power.
Caussy et al. demonstrated that a combination of 10 serum metabolites including lipids, amino acids, and carbohydrates had a very good discriminatory ability for the detection of advanced fibrosis among patients with biopsy-proven NAFLD [29]. The specific panel of blood biomarkers showed greater diagnostic accuracy with higher AUROC values than the FIB-4 Index and NFS to detect advanced fibrosis, which was confirmed afterwards in two independent validation cohorts. Moreover, the panel demonstrated the ability to evaluate longitudinal changes in serum metabolites in assessing the disease progression, which is a valuable characteristic rarely seen among biomarkers [29]. Harrison et al. assessed the diagnostic accuracy of MCP-1 and liver fibrosis-specific protein [20], and Corey et al. evaluated the diagnostic accuracy of NFPP and ADAMTSL2 in the detection of significant and advanced fibrosis, aiming to detect the "protein-based signature of fibrosis" [30]. MCP-1 demonstrated a low level of diagnostic accuracy for detecting significant and advanced fibrosis among NASH patients [20], while NFPP and ADAMTSL2 showed promising results in the detection of advanced fibrosis among NAFLD patients (AUROC of 0.830) [30]. Decraecker et al. demonstrated in metabolic (dysfunction)-associated fatty liver disease (MAFLD) patients with liver stiffness measurements, FIB-4, and LIVERFASt, that noninvasive methods were correlated with overall and liver-related mortalities (p < 0.001), and with all-cause and liver-related outcomes (p < 0.001) [42].
Studies captured in this SLR provide an insight into the new perspectives on diagnostic tools and panels which combined imaging techniques and blood-based biomarkers for detecting significant fibrosis. The FAST score is a novel technique that combines liver stiffness measurement and controlled attenuation parameters measured by VCTE and serum levels of AST. That score was already established in a European cohort and Woreta et al. validated the results in the US population [38]. The FAST score demonstrated high diagnostic accuracy in the detection of significant fibrosis among NAFLD patients. Still, the modest PPV implies the score should be interpreted carefully when ruling in patients with significant fibrosis [38]. The other novel technique established by Jung et al. demonstrated even better diagnostic accuracy of the MEFIB index, and MRE liver examination in combination with the FIB-4 index, for detecting significant fibrosis among NAFLD patients [37].
Our findings are in line with the results of other studies. Chalasani et al. reported that 27% of patients evaluated with VCTE yielded unreliable results [43] while a European retrospective study demonstrated the high accuracy of VCTE with an AUROC of 0.800 at 9.9 kPa and 11.4 kPa cutoff values for the detection of advanced fibrosis among patients with biopsy-proven NASH [44]. A previously published meta-analysis comparing the accuracy of VCTE and MRE in fibrosis detection concluded that MRE provides significantly greater accuracy, although both methods performed very well in NAFLD patients [45]. However, despite the good reliability of MRE in detecting and differentiating liver fibrosis, the decision to use one method over another depends on multiple factors, including the availability of the tool and cost-effectiveness. MRE also requires special equipment, software, additional hardware beyond routine scanners, as well as experienced experts for results validation and interpretation [46]. Thus, it is unlikely that MRE will replace US-based imaging methods for the detection and longitudinal tracking of liver fibrosis in routine clinical practice in the near future. The overarching pitfall of all imaging methods is unreliable specificity and dependence on the screener's experience and subjectivity in determining the fibrosis stage. Furthermore, imaging techniques may be non-standardized, costly, not widely available, or inaccessible [45].
Similarly, although multiple noninvasive biomarkers are available on the market for detecting significant or advanced liver fibrosis, there is still an unmet need for a test that would provide more accurate staging and differentiation of fibrosis as currently used noninvasive tests remain inconclusive in approximately 30% of patients [47]. The Noninvasive Biomarkers of Metabolic Liver Disease (NIMBLE) consortium demonstrated that only NIS4, ELF test, and FibroMeter-VCTE met the predefined criteria (AUROC > 0.800) for accurate diagnosis of significant fibrosis (F2+ stages), while the ELF test and FibroMeter-VCTE met criteria for successful determination of advanced fibrosis (F3+ stages). Other investigated tests (FIB-4 index, serum ALT levels, OWLiver, and serum PRO-C3 levels) did not satisfy the criteria in terms of diagnostic accuracy in the detection of F2+ and F3+ fibrosis stages [48]. No diagnostic test addressed the unmet need in terms of diagnostic tool sensitivity and specificity of >80% in the detection of any stage of fibrosis, which was the minimum acceptable level specified by payers in the US. Of all evaluated diagnostic panels, only NIS4 for the detection of significant fibrosis and FibroMeter-VCTE for the detection of advanced fibrosis met the criteria specified by the US healthcare providers (sensitivity and specificity > 75%) [48]. The European Association for the Study of the Liver (EASL) clinical practice guidelines for NASH listed several reliable biomarkers for fibrosis detection with AUROC values higher than 0.8, including NFS, FIB-4 index, ELF, and FibroTest [49,50]. Apart from fibrosis detection, the tests also predicted liver-related and overall mortality with good precision. Still, the guidelines emphasize that the tests can correctly distinguish advanced fibrosis from lower stages, but not significant fibrosis. Additionally, the high NPV show that the tests perform particularly well in excluding advanced fibrosis so that they can be used as a first-line strategy in risk stratification, while they are not as good in ruling in fibrosis [49]. The study points out that predictive values are highly dependent on fibrosis prevalence in the study population, which is generally higher than in the community. Thus, the generalizability of the study findings would need to be tested in larger populations of patients [49]. A recently published review reports on the use of serum fibrosis biomarkers based on routine biochemistry and VCTE as validated and well-incorporated screening strategies for identifying high-risk patients. Still, MRI techniques are seen as the most promising noninvasive diagnostic strategy as they offer accurate fibrosis staging with the ability to assess therapeutic response [51]. Finally, a systematic review of available guidelines for NAFLD assessment concludes that fibrosis scores may help detect high-risk patients who may be referred to liver biopsy; still, all guidelines stress the necessity of developing a noninvasive test that will replace liver biopsy as a research priority [52].
Liver biopsy remains the reference standard in fibrosis detection and classification [53,54]. However, a growing body of evidence highlights the limitations of liver biopsy, particularly in fibrosis detection [55]. Aside from the risk of complications and invasive nature, sampling variability remains a big concern as histological lesions are unevenly distributed throughout the liver tissue [56]. Further problems with pathological diagnosis arise with inter-and intra-observer variability. Therefore, evaluating test accuracy with an imperfect reference standard such as a liver biopsy poses the risk of underestimating NASH and fibrosis severity [55].

Study Strengths and Limitations
The major strength of this review is that this is the first study to systematically summarize and compare noninvasive diagnostic tools evaluated in the US healthcare context. As the majority of noninvasive diagnostic tools were compared to the biopsy gold standard, this review collects clinically relevant evidence, establishing basis for decision making in the field. This SLR has several limitations. As in all literature reviews, it has to be acknowledged that the reliability of the findings depends on the methodology and validity of the primary studies. Only diagnostic tools with either sensitivity or specificity values were presented in the Results section. The SLR considered only two literature databases, PubMed and Web of Science. Another limitation is the heterogeneity of the included studies that prevented us from synthesizing quantitative evidence and providing narrow point estimates of the effective measures. This review aimed to systematically summarize the recent trends in clinical practice and diagnostic research in NAFLD, identifying articles published from 2016 onwards. It has to be denoted that the review may have omitted promising research tools published prior to 2016. Thus, the study results were discussed and interpreted with caution, paying particular attention to the existing body of evidence and ensuring the conclusions are in line with findings from previous systematic literature reviews [10,47,57,58]. Some of the included studies were conducted in small population samples; therefore, the demonstrated results may lack generalizability. Additionally, some diagnostic tools were assessed in only one published article, so the findings have to be interpreted with caution until confirmed in larger observational studies. Furthermore, the lack of direct head-to-head comparisons between the diagnostic strategies limits the possibility of unbiased comparison of diagnostic accuracy between the tools. Our study concentrated on the diagnostic ability of noninvasive tools to identify and differentiate significant and advanced fibrosis (F2+ and F3+), stipulating that the diagnosis made at this stage may impact clinical decision-making and change the course of the disease. Still, the efficiency of the tools in detecting earlier stages of fibrosis (F1+) has not been reviewed, while it may be increasingly important in the disease course, as treatment measures at earlier stages may positively impact the disease outcomes [59]. Additionally, some of the multicenter studies cited here did not have a centralized pathological reading, which then introduces substantial bias in relation to inter-pathologist variability. Finally, quality assessment and critical appraisal of the studies were not performed. The SLR presents only formally published data, which may lead to publication bias, as journals are strongly biased towards publishing only the studies that report significant differences in the results. Still, as the SLR was not primarily focused on treatment effectiveness, the probability of bias in our review is low.

Conclusions
Liver fibrosis detection, staging, and monitoring represent crucial points in the clinical assessment and risk evaluation of patients with nonalcoholic liver disease, as it is the major predictor of patients' morbidity and mortality. Imaging techniques represent an important part of the management of patients with suspected liver fibrosis, becoming increasingly incorporated into routine clinical practice. Imaging techniques overcome limitations of liver biopsy, such as discomfort, invasiveness, and repeated tissue sampling, providing good overall accuracy in fibrosis detection. However, they are still not able to provide a sufficiently sensitive and reliable estimate of quantitative longitudinal and dynamic changes in fibrosis development and regression. Moreover, advanced imaging techniques like VCTE and MRE require costly equipment and trained personnel, so they are less available in clinical practices across the country.
Observing the wide spectrum of available biomarkers, including clinical scores and panels that combine several blood tests, it may be correctly concluded that noninvasive biomarkers play a significant and ever-increasing role in detecting liver fibrosis in patients with NAFLD, including high-risk subgroups of patients. On the other hand, observing the high sensitivity, specificity, and AUROC values presented in the studies, it could be falsely concluded that there are tools with almost perfect diagnostic abilities that may detect liver fibrosis with a precision equivalent to substantially more expensive imaging methods or even biopsy. Still, in reality, the perfect noninvasive biomarker has not been established so far. In general, noninvasive biomarkers demonstrated very good accuracy in excluding significant or advanced liver fibrosis. The highest accuracy was observed among scores that combine several biomarkers, metabolites, and clinical parameters. Still, the studies generally conclude that all these tests have limited power in detecting and quantifying fibrosis levels, which is necessary for patient management and monitoring of disease progression. Several innovative technologies that demonstrated promising initial results in small patient cohorts have to be externally validated in wider independent studies. There is still an unmet need for a noninvasive biomarker that can detect, measure, and differentiate fibrosis stages with great sensitivity and specificity. Additionally, the optimal diagnostic tool has to be easily applicable and affordable for patients, providers, and healthcare centers at all levels, due to the rising prevalence of the disease among all age categories, ethnicities, and risk groups. Nonetheless, because of the variability of the biopsy itself as well as of the pathological reading, the ultimate validation of noninvasive markers will involve their ability to predict clinical events rather than a particular histological lesion.