Assessment of Liver Fibrosis Stage Using Integrative Analysis of Hepatic Heterogeneity and Nodularity in Routine MRI with FIB-4 Index as Reference Standard

Image-based quantitative methods for liver heterogeneity (LHet) and nodularity (LNod) provide helpful information for evaluating liver fibrosis; however, their combinations are not fully understood in liver diseases. We developed an integrated software for assessing LHet and LNod and compared LHet and LNod according to fibrosis stages in chronic liver disease (CLD). Overall, 111 CLD patients and 16 subjects with suspected liver disease who underwent liver biopsy were enrolled. The procedures for quantifying LHet and LNod were bias correction, contour detection, liver segmentation, and LHet and LNod measurements. LHet and LNod scores among fibrosis stages (F0–F3) were compared using ANOVA with Tukey’s test. Diagnostic accuracy was determined by calculating the area under the receiver operating characteristics (AUROC) curve. The mean LHet scores of F0, F1, F2, and F3 were 3.49 ± 0.34, 5.52 ± 0.88, 6.80 ± 0.97, and 7.56 ± 1.79, respectively (p < 0.001). The mean LNod scores of F0, F1, F2, and F3 were 0.84 ± 0.06, 0.91 ± 0.04, 1.09 ± 0.08, and 1.15 ± 0.14, respectively (p < 0.001). The combined LHet × LNod scores of F0, F1, F2, and F3 were 2.96 ± 0.46, 5.01 ± 0.91, 7.30 ± 0.89, and 8.48 ± 1.34, respectively (p < 0.001). The AUROCs of LHet, LNod, and LHet × LNod for differentiating F1 vs. F2 and F2 vs. F3 were 0.845, 0.958, and 0.954; and 0.619, 0.689, and 0.761, respectively. The combination of LHet and LNod scores derived from routine MR images allows better differential diagnosis of fibrosis subgroups in CLD.


Introduction
Liver fibrosis is a hallmark of chronic liver disease (CLD) characterized by excessive accumulation of extracellular matrix proteins responsible for fibrogenesis [1,2]. Liver fibrosis may progress to cirrhosis, the end stage, which constitutes the most important risk factor for developing hepatocellular carcinoma (HCC) [3]. Liver biopsy has been regarded as the reference diagnostic method for evaluating the stage of liver fibrosis in CLD [4]. However, this method has well-known weaknesses including sampling errors, low patient acceptance, and complications such as pain, bleeding, infection, and rarely death [5]. Moreover, in the CLD patients with an initial diagnosis of early stage fibrosis (compensated liver) or cirrhosis (decompensated liver as end stage), it is difficult to accurately predict hepatic compensation or decompensation using noninvasive methods [6]. Thus, there is an unmet need for widely applicable noninvasive methods to diagnose fibrosis and advanced cirrhosis and to predict future risk of hepatic decompensation.
Recently, there have been considerable efforts to develop imaging techniques and quantification programs for diagnosing and staging liver fibrosis. There are several methods including contrast-enhanced imaging, elastography, image-based morphologic analysis, and texture analysis [7][8][9]. Among them, image-based morphologic analysis includes quantification of parenchymal heterogeneity, contour change by the liver nodules, atrophic or necrotic change, edge blunting, fissural widening, and so on [10]. Several quantification software program have been introduced for assessing the findings of liver fibrosis and cirrhosis on medical images [10][11][12]. Heterogeneity quantification programs using coefficient of variation (CV) maps can help to assess the severity of fibrosis and cirrhosis in patients with chronic hepatitis B [10,11]. Several studies [10,11] reported that the area under the receiver operating characteristic curves (AUROCs) on the magnetic resonance (MR) CV map scores were 0.875 for discriminating significant fibrosis (≥fibrosis grade 2; F2) in chronic hepatitis B and 0.788 for the presence of HCC in patients with liver cirrhosis (F4). Moreover, liver surface nodularity (LSN) can be useful for differentiating the severity of fibrosis. A study [12] reported that the AUROC was 0.788 for discriminating significant fibrosis (≥F2) in nonalcoholic fatty liver disease (NAFLD). Other studies showed that the AUROCs of computed tomography (CT) LSN scores were 0.902 and 0.959 for discriminating significant fibrosis (≥F2) and cirrhosis (F4), respectively [13,14]. A comparative study using MR LSN and MR elastography demonstrated that the AUROCs for diagnosing significant fibrosis (≥F2) were 0.61 for MR LSN and 0.87 for MR elastography [15]. However, these studies used a single measurement, either liver heterogeneity (L Het ) or nodularity (L Nod ), for evaluating liver fibrosis; the method was not clearly distinguished among each fibrosis stage. Taking all of these findings into consideration, the computer-aided L Het and L Nod scores can provide important information for differential diagnosis of hepatic fibrosis. The integrative analysis of L Het or L Nod for evaluating liver fibrosis is not yet fully understood.
For this study, we developed an integrated semiautomated quantification software for assessing L Het and L Nod and compared them across fibrosis stages in CLD.

Ethics Statment
The study protocol was approved as retrospective research (WKUH-2017-03-026) by the institutional review board (IRB) of University Hospital. Written informed consent was waived by the University Hospital IRB committee due to the use of anonymous archival data including MRI data (radiology_common data model: R_CDM, version 2.0.0) and electronic health records (observational medical outcomes partnership-CDM: OMOP-CDM, version 5.3) for the application of developed software. This study was conducted in accordance with the Helsinki Declaration and Good Clinical Practice guidelines.

Subject Population
Among the 1654 consecutive patients who underwent radiological examination at our institution from April 2003 to December 2018, patients 20 years or older who underwent abdominal MRI at 3.0 T and who had available serologic tests within five months of MRI were retrospectively identified. Of 121 eligible patients, 10 were excluded due to the absence of MR images for liver protocols and the absence of medical records for CLD ( Figure 1). The inclusion criteria of CLD patients were the elevation of liver function enzymes, alanine transaminase (ALT), and aspartate transaminase (AST) and the absence of liver cirrhosis (F4) [14]. The CLD subgroups were divided into three fibrosis groups according to the serum biomarkers of fibrosis-4 index (FIB-4, Equation 1) values as follows: F1, mild fibrosis group < 1.45; F2, significant fibrosis group 1.45-3.25; and F3, advanced fibrosis group > 3.25 (Table 1). Finally, the subgroups consisted of 9 F1 (mean age; 50.3 ± 14.9 years), 57 F2 (60.3 ± 12.1 years), and 45 F3 (mean 64.8 ± 13.6 years). This study included 16 subjects (35.0 ± 15.5 years) with suspected liver disease who underwent the needle biopsy for comparison ( Figure 1). These individuals had symptoms of fatigue and inactivity. They had abnormal liver function tests, and there was no histological evidence for liver fibrosis and advanced cirrhosis (no fibrosis group, F0).
The upper limit of normal AST was 35 in this study.

Software for Quantification of Liver Heterogeneity and Nodularity
LHet and LNod quantification software (customized software; named WALTS) was coded by MATLAB (MathWorks, Natick, Massachusetts). Wonkwang Abdomen and Liver Total Solution (WALTS) software is a customized semiautomated postprocessing program that operates on Windows platform (client version: XP or higher; Microsoft, Redmond, WA). We used WALTS to process the MR images in the DICOM (Digital Imaging and Communications in Medicine) format to generate the LHet and LNod scores using a previously described procedure [13,14,16]. Figure 2 shows the GUI of WALTS and a simple flowchart showing the development of an algorithm for qualitative and quantitative analysis. The procedures for quantifying LHet and LNod scores were as follows: bias correction of field uniformity, liver contour detection for drawing the liver reference line, liver segmentation, and LHet and LNod measurements. Graphical user interface (GUI) of a tailor-made quantification software for assessing liver heterogeneity (LHet) and nodularity (LNod) (left side) and flowchart for quantifying liver fibrosis (right side). Figure 3 shows the overall image postprocessing procedures for hepatic heterogeneity (LHet) and nodularity (LNod) quantification using MR images. To automatically detect

Software for Quantification of Liver Heterogeneity and Nodularity
L Het and L Nod quantification software (customized software; named WALTS) was coded by MATLAB (MathWorks, Natick, Massachusetts). Wonkwang Abdomen and Liver Total Solution (WALTS) software is a customized semiautomated postprocessing program that operates on Windows platform (client version: XP or higher; Microsoft, Redmond, WA). We used WALTS to process the MR images in the DICOM (Digital Imaging and Communications in Medicine) format to generate the L Het and L Nod scores using a previously described procedure [13,14,16]. Figure 2 shows the GUI of WALTS and a simple flowchart showing the development of an algorithm for qualitative and quantitative analysis. The procedures for quantifying L Het and L Nod scores were as follows: bias correction of field uniformity, liver contour detection for drawing the liver reference line, liver segmentation, and L Het and L Nod measurements.

Software for Quantification of Liver Heterogeneity and Nodularity
LHet and LNod quantification software (customized software; named WALTS) was coded by MATLAB (MathWorks, Natick, Massachusetts). Wonkwang Abdomen and Liver Total Solution (WALTS) software is a customized semiautomated postprocessing program that operates on Windows platform (client version: XP or higher; Microsoft, Redmond, WA). We used WALTS to process the MR images in the DICOM (Digital Imaging and Communications in Medicine) format to generate the LHet and LNod scores using a previously described procedure [13,14,16]. Figure 2 shows the GUI of WALTS and a simple flowchart showing the development of an algorithm for qualitative and quantitative analysis. The procedures for quantifying LHet and LNod scores were as follows: bias correction of field uniformity, liver contour detection for drawing the liver reference line, liver segmentation, and LHet and LNod measurements. Graphical user interface (GUI) of a tailor-made quantification software for assessing liver heterogeneity (LHet) and nodularity (LNod) (left side) and flowchart for quantifying liver fibrosis (right side). Figure 3 shows the overall image postprocessing procedures for hepatic heterogeneity (LHet) and nodularity (LNod) quantification using MR images. To automatically detect the liver's contour, we used a novel region-based method for liver segmentation as a level set method, which provided the local clustering criterion function with correction with intensity inhomogeneities ( Figure 4B) [17]. The boundary detection and segmentation Figure 2. Graphical user interface (GUI) of a tailor-made quantification software for assessing liver heterogeneity (L Het ) and nodularity (L Nod ) (left side) and flowchart for quantifying liver fibrosis (right side). Figure 3 shows the overall image postprocessing procedures for hepatic heterogeneity (L Het ) and nodularity (L Nod ) quantification using MR images. To automatically detect the liver's contour, we used a novel region-based method for liver segmentation as a level set method, which provided the local clustering criterion function with correction with intensity inhomogeneities ( Figure 4B) [17]. The boundary detection and segmentation techniques maximize the local intensity clustering property and minimize the energy formulation to determine and exclude any existing signal outliers caused by generated systematic artifacts as described in previous studies [12,18]. The contour line in the selected slice of the liver was produced after bias correction. Following preprocessing of MRI data, the liver surface line for L Het and L Nod quantification was extracted as a reference line, and the extracted line was confirmed by two abdominal radiologists (with 29 and 8 years of experience in abdominal imaging) ( Figure 4C). Five circular regions of interest (ROIs; each 40 pixels) for L Het measurement were drawn on the liver parenchyma. In all subjects, ROIs were placed on the liver parenchymal areas with no overlap over large vessels or focal lesions. The L Het score and L Het map were calculated using the following Equations (2) and (3):

Data Processing and Quantification of MRI in CLD
slice of the liver was produced after bias correction. Following preprocessing of MRI data, the liver surface line for LHet and LNod quantification was extracted as a reference line, and the extracted line was confirmed by two abdominal radiologists (with 29 and 8 years of experience in abdominal imaging) ( Figure 4C). Five circular regions of interest (ROIs; each 40 pixels) for LHet measurement were drawn on the liver parenchyma. In all subjects, ROIs were placed on the liver parenchymal areas with no overlap over large vessels or focal lesions. The LHet score and LHet map were calculated using the following Equations (2) and (3): To measure the LNod score, the liver parenchyma within the confirmed liver boundary line was used for the multipolynomial curve fitting analysis. ROIs for LNod measurement were selected along the contour of the liver ( Figure 4G). The user would insert an LNod ROI range across the data points of the liver surface line. After input of an LNod ROI range, To measure the L Nod score, the liver parenchyma within the confirmed liver boundary line was used for the multipolynomial curve fitting analysis. ROIs for L Nod measurement were selected along the contour of the liver ( Figure 4G). The user would insert an L Nod ROI range across the data points of the liver surface line. After input of an L Nod ROI range, a smooth curve-fitting line (polynomial line shape) was generated on a selected ROI dataset ( Figure 4H). Finally, the difference between the liver surface line and the new polynomial curve-fitting line (one of second-, third-, and fourth-order line shape) was evaluated on a pixel-by-pixel basis. The difference value was squared; then it was used to calculate the mean, variation, and standard deviation (SD). The final L Nod score in an individual subject was calculated as the mean L Nod obtained from the measurements on ROIs. In addition, a combined score derived from L Het and L Nod was calculated as the multiplication of both scores (=L Het × L Nod ). lated as the mean score of the three measurements taken for each patient. The WALTS program for LHet and LNod quantification used MR images in DICOM format to generate the LHet and LNod scores. The technical details for obtaining the LHet and LNod measurements were described in recent papers [10][11][12][13]. Measurements of at least three and/or four ROIs were performed for each subject. Final LHet and LNod scores were calculated by the program as a mean value of the individual measurements, with a higher LHet (or LNod) score indicating a higher degree of parenchymal heterogeneity (or nodularity). Figure 4 shows the representative images in LHet and LNod measurements in an axial MR image.  All MR studies reviewed standard picture archiving and communication system (PACS) stations and software with standard window settings. The liver MR images in each CLD patient were assessed by two abdominal radiologists, who were blinded to clinical outcome, using WALTS software. After opening DICOM images on the software, they selected image slices at the level of the hepatic hilum. Then, bias correction and segmentation were performed on the selected images. For L Het measurement in each subject, five circular ROIs were manually drawn in the liver parenchyma ( Figure 5); these areas contained the liver parenchyma, avoiding the perceivable bile duct, major intrahepatic vessels, subcapsular area, and focal lesions such as cysts or benign and malignant tumors [11]. The final L Het and L Nod scores in each fibrosis group were calculated as an averaged score obtained by reporting scores of each observer (observer A: YRK, observer B: YHL) for AUROC differential diagnosis according to fibrosis stages.   All the measurements on selected MR images (the level of the hepatic hilum) were repeated two weeks after the first measurement was obtained to evaluate intraobserver agreement. Furthermore, to determine interobserver agreement, both radiologists independently measured L Het and L Nod scores on selected images. The intra-and interobserver variability in the L Het and L Nod measurements was assessed. The overall scores of L Het and L Nod were calculated as the mean score of the three measurements taken for each patient. The WALTS program for L Het and L Nod quantification used MR images in DICOM format to generate the L Het and L Nod scores. The technical details for obtaining the L Het and L Nod measurements were described in recent papers [10][11][12][13]. Measurements of at least three and/or four ROIs were performed for each subject. Final L Het and L Nod scores were calculated by the program as a mean value of the individual measurements, with a higher L Het (or L Nod ) score indicating a higher degree of parenchymal heterogeneity (or nodularity). Figure 4 shows the representative images in L Het and L Nod measurements in an axial MR image.

Statistical Analysis
The L Het and L Nod scores among three different stages of fibrosis in CLD were compared using the SPSS version 20.0 program (SPSS Inc., Chicago, IL, USA). The variation in L Het and L Nod scores was analyzed using analysis of variance (ANOVA) with Tukey's post hoc test. The difference between CLD patients and the control group was analyzed using the independent two-sample t-test. Intraobserver agreement (between measurements from the same observer) was calculated as the mean coefficient of variance (%) for the variability of L Het and L Nod scores taken by the same single observer [19]. Also, the variation between the scores of both observers was analyzed with paired t-test. Intraobserver agreement was performed on the basis of the intraclass correlation coefficient (ICC) between the L Het and L Nod scores. The ICCs were indicated based on the levels of reliability as follows [20]: poor (<0.4), moderate (0.4-0.6), good (0.6-0.8), and excellent (0.8-1.0).
The diagnostic performance of L Het , L Nod , and L Het × L Nod scores according to fibrosis stages was evaluated with ROC curve analysis including the AUROC, sensitivity, and specificity. Two-sided p-values less than 0.05 were considered to denote statistical significance in all tests. Figure 1 shows the inclusion flowchart for the study population. The etiology of liver fibrosis in CLD and the average enzyme levels according to fibrosis stages are listed in Table 1. The serum biochemistry showed significant difference among three groups in the levels of alkaline phosphatase (ALP, p = 0.002), glutamyl transpeptidase (GGT, p = 0.001), and platelet count (p < 0.001). However, there was no significant difference among the fibrosis groups as follows: albumin (p = 0.873), alanine aminotransferase (ALT, p = 0.224), aspartate aminotransferase (AST, p = 0.058), and bilirubin (p = 0.058).

Liver Heterogeneity and Nodularity Measurements in CLD
MRI data for 111 CLD patients and 16 subjects with suspected liver disease were analyzed with developed WALTS software. Figure 3 shows the detailed image processing procedures. Figure 4 shows a representative MR image of a patient with CLD ( Figure 4A), bias-corrected MR image ( Figure 4B), liver contour detection ( Figure 4C), region of interest (ROI) drawing on MR image ( Figure 4D), binary image of ROI for liver segmentation ( Figure 4E), segmented liver MRI ( Figure 4F), final reference line ( Figure 4G), and curvefitting lines ( Figure 4H) for L Het and L Nod measurements. Figure 5 shows the representative quantification images in each fibrosis stage (F1-F3), and the results are summarized in Table 2. Mean L Het , L Nod , and L Het × L Nod scores in CLD were higher than those in the F0 group (p < 0.001). Mean L Het , L Nod , and combined L Het × L Nod scores were significantly different between fibrosis stages (F1-F3) (ANOVA; p < 0.001). In multiple comparisons, L Het scores were different from each other (F1 vs. F2, p = 0.032; F1 vs. F3, p < 0.001; and F2 vs. F3, p = 0.008). Additionally, L Nod scores were different in F1 vs. F2, p < 0.001; F1 vs. F3, p < 0.001; and F2 vs. F3, p = 0.022. The combined scores were significantly different in the contrasts of F1 vs. F2, F1 vs. F3, and F2 vs. F3 (p < 0.001). There were significant differences among fibrosis stages based on quantitative L Het , L Nod , and L Het × L Nod scores. Data are presented as mean ± SD. The final L Het and L Nod scores in each fibrosis group were calculated as an averaged score obtained by reporting scores of two observers (observer A: Y.R.K., observer B: Y.H.L.) for AUROC differential diagnosis according to fibrosis stages. The average scores in F0 group were used as the reference ranges. * The difference among the three fibrosis groups was analyzed by one-way ANOVA with Tukey post hoc test as follows: a F1 vs. F2, b F1 vs. F3, and c F2 vs. F3. Figure 6 shows AUROC curves of L Het , L Nod , and L Het × L Nod scores for the discrimination of fibrosis stages, and the results are summarized in Table 3 Table 3. Receiver operator curve analysis for diagnosing fibrosis stage using liver heterogeneity (LHet) and nodularity (LNod) scores.  The AUROCs of L Het × L Nod scores were 0.954 for F1 vs. F2 (95%CI, 0.884-1.000; p < 0.001) and 0.761 for F2 vs. F3 (95%CI, 0.669-0.853; p < 0.001), 0.984 for F0-1 vs. F2 (95%CI, 0.960-1.000; p < 0.001) and 0.999 for F0-1 vs. F3 (95%CI, 0.996-1.000; p < 0.001). The diagnostic accuracy of F1 vs. F2 had 0.895 sensitivity and 0.889 specificity at a cut-off L Het × L Nod score of 6.10; F2 vs. F3 had 0.667 sensitivity and 0.684 specificity at a cut-off L Het × L Nod score of 7.85; F0-1 vs. F2 had 0.911 sensitivity and 0.920 specificity at a cut-off L Het × L Nod score of 5.98; and F0-1 vs. F3 had 0.978 sensitivity and 1.000 specificity at a cut-off L Het × L Nod score of 6.70.

Intraobserver and Interobserver Agreement
The intra-and interobserver variabilities of L Het and L Nod scores from two observers are summarized in Table 4. In intraobserver variability, the mean coefficient of variation within the same observer was in the range of 13-28% for L Het measurements and in the range of 4-14% for L Nod measurements. Also, there was no significant difference between the averaged L Het and L Nod values of the two observers (p > 0.05). In interobserver variability, ICCs were higher than 0.6, indicating good reliability. The ICCs (range: 0.601-0.852) were 0.718 for L Het measurements and 0.832 for L Nod measurements. The overall L Het and L Nod measurements of both observers showed good agreement (p < 0.05). Table 4. Intra-and interobserver variability in liver heterogeneity (L Het ) and nodularity (L Nod ) measurements according to fibrosis stages. Abbreviations: ICC: intraclass correlation coefficient; CI: confidence interval. L Het and L Nod scores of each observer are presented as means ± SD (mean coefficient of variance, %). * The differences between both observers in L Het and L Nod scores were assessed by the paired t-test. † The intrarater reliability between both observers was assessed by the intraclass correlation (ICC) test.

Discussion
This study developed an integrated system (semiautomated WALTS software) for evaluating L Het and L Nod in liver diseases and compared the subgroups of fibrosis stages in CLD patients obtained from retrospective routine MRI datasets with serologic laboratory tests. In this study, liver MR images with three-dimensional THRIVE pulse sequence (routine T1 MR images) demonstrated acceptable accuracy in diagnosing fibrosis stages of CLD patients. L Het , L Nod , and L Het × L Nod scores in CLD patients were higher than those in the F0 group. The AUROC-based differentiation in comparison of F1 vs. F2 fibrosis was significant as L Het 0.845, L Nod 0.958, and L Het × L Nod 0.954. Moreover, the AUROC in F2 vs. F3 was significant as L Het 0.619, L Nod 0.689, and L Het × L Nod 0.761. Smith et al. [13] and Pickhardt et al. [14] reported that the L Nod diagnostic accuracy using CT images is excellent for predicting fibrosis (≥F2) or cirrhosis (F4) (0.910 and 0.959 AUROC, respectively). Furthermore, Lee et al. [10] reported that the mean L Het values showed good discrimination for staging of significant fibrosis (≥F2) in chronic hepatitis B (aspartate aminotransferase to platelet ratio index: APRI 0.875 and FIB-4 0.831 AUROC). In the present study, the L Het and L Nod scores in the CLD patients are in accordance with these previous results [10,11,13,14], confirming patients with significant fibrosis (≥F2) and/or precirrhotic hepatic fibrosis.
This study investigated the potential variation in L Het and L Nod measurements and interobserver assessment. To successfully detect signals from the liver parenchyma and surface, all T1-weighted MRI data were performed for bias correction of field homogeneity before the liver contour detection. Quantitative L Het and L Nod scores showed reliable measurements as an averaged CV value <25%. The L Het and L Nod scores measured from two observers showed good interobserver agreement (>0.6), indicating reproducibility. Thus, the WALTS software-based L Het and L Nod measurements can be reproducible in clinical MR images. However, the most accurate test for assessing liver fibrosis is currently MR elastography (MRE), which has ICC >95% and accuracy >90% in liver stiffness measurements [21]. In our study, the L Het and L Nod have ICC of 0.72 and 0.83, respectively, which is quite low compared to MRE. Thus, further study is needed for a more accurate quantification method in the L Het and L Nod measurements.
With regards to the grading of liver fibrosis, the APRI and FIB-4 serologic indices are well known [22,23]. In a meta-analysis study [22], the pooled ROCs of the FIB-4 index were 0.74-0.84 in the patients with chronic hepatitis B virus infection. The summary ROC (SROC) values of FIB-4 were higher than those of APRI for advanced fibrosis and cirrhosis. Two systematic reviews [23,24] reported that the SROC values for the accuracy of APRI in patients with hepatitis C (HCV) or coinfection of HCV/human immunodeficiency virus (HIV) were 0.76-0.77 for significant fibrosis, 0.80 for advanced fibrosis, and 0.82-0.83 for cirrhosis. Based on these findings, the FIB-4 index for diagnosing liver fibrosis and cirrhosis has similar or superior diagnostic accuracy to that of APRI. For this study, we used the FIB-4 scoring system using serum ALT, AST, and platelet levels. The interesting features in this study are that the L Het , L Nod , and L Het × L Nod scores are significantly different among fibrosis stages. The mean L Het , L Nod , and L Het × L Nod scores in severe fibrosis stages F2 and F3 were significantly higher than those in mild fibrosis F1 (as shown in Table 2). Thus, it is notable that quantified L Het , L Nod , and L Het × L Nod scores can provide information for diagnosing hepatic fibrosis. In previous studies, several imaging methods were reported for differentiating hepatic fibrosis and cirrhosis. A study [25] compared the diagnostic accuracy between gadoxetic acid-enhanced MR imaging, transient elastography, and ultrasound shear wave elastography point quantification (ElastPQ). The gadoxetic acid-enhancement index showed similar diagnostic accuracy for significant fibrosis (≥ F2) or cirrhosis (F4) when using transient elastography (AUROC 0.866 and 0.884) or ElastPQ (AUROC 0.751 and 0.786), respectively. A comparative study [26] using hepato-biliary phase imaging (relative enhancement), susceptibility-weighted imaging (SWI; liver-to-muscle ratio), and diffusion-weighted imaging (DWI; apparent diffusion coefficient: ADC value) reported that the AUROC of SWI showed higher value for diagnosing cirrhosis (F4) than the hepatobilliary phase image and DWI (0.92 vs. 0.80 and 0.79), and the AUROC of the combination of all of these showed the highest value for diagnosing cirrhosis (0.93). A recent study [27] of gadoxetic acid-enhanced MR imaging using a radiomics model based on texture analysis reported that the AUROCs of the radiomic fibrosis index were 0.90, 0.89, and 0.91 for significant fibrosis, advanced fibrosis, and cirrhosis, respectively. In the present study, L Het and L Nod scores have similar, excellent diagnostic accuracy for significant fibrosis (≥ F2). Thus, the L Het and L Nod quantification can be a noninvasive technique capable of detecting fibrotic changes within the liver parenchyma in CLD. The major strengths of the integrated WALTS program include the ability to evaluate previously obtained liver MR or CT images (useful for retrospective large-scale population studies), wide availability of MR and CT imaging, no requirement for intravenous contrast media injection, and no additional hardware requirements for image acquisition procedures. Moreover, the L Het and L Nod quantification program may help predict cirrhosis, liver compensation, and death [16]. Therefore, this MRI-compatible WALTS software may be useful for clinical application to various liver diseases including CLD.
Diagnostic accuracy, reproducibility, and repeatability in L Het and L Nod measurements are crucial for assessing the diagnostic performance of an imaging technique [28]. The L Het and L Nod scores derived from routine liver MR images showed good reproducibility between two different observers in diagnosing hepatic fibrosis. WALTS software can quantify axial 3D-THRIVE MR images in less than 5 min. The applicability of WALTS to retrospective clinical studies has great merits since it allows us to predict liver fibrosis and compare disease progression during prospective long-term follow-up studies.
This study included several limitations. First, this study is a retrospective study with relatively small population size and dealt with CLD patients with heterogeneous underlying disease causes as given in Table 1. The patient cohort used largely included hepatitis B and C. However, there was no consideration of the heterogeneous disease causes in enrolled subjects. Thus, the imaging findings and the predictive power in a larger cohort might be diverse in the patients without viral hepatitis. Also, the distribution skewness (the F0 group is 16/127 patients (12.6% in study population) and F1 group is only 9/127 patients (<10%)) might cause the potentially skewed results; it can lead to spectrum bias in the study population. Future study is needed for a validation study for strengthening the translational impacts in another larger cohort with even subgroups. Although we included a pathologically confirmed F0 group for comparison, this study used the only FIB-4 index to stage liver fibrosis in CLD as a standard for comparison. Further research would be useful to directly compare our method to FibroScan, which currently represents the most utilized technology in the evaluation of fibrosis. FIB-4 index is good for distinguishing cirrhosis from lower fibrosis stages and even then has a modest accuracy. However, this index might potentially lead to false-positive or true-negative findings due to moderate discrimination accuracy and its own limitations. Second, this study did not consider the relationship between L Het and L Nod scores and complications of liver fibrosis. A Lee et al. study [10] reported that quantified L Het scores are correlated with serologic indices, reflecting liver functional status. Smith et al. [13] reported that a single L Nod score allows the prediction of decompensated cirrhosis and death. Sartoris et al. [29] reported that portal hypertension can be detected using a CT-based L Nod score with a high degree of reliability. Considering these findings, future studies are needed to investigate the correlation between L Het and L Nod scores and complications of liver fibrosis. Third, this study performed the reproducibility test in the L Het and L Nod measurements at a single center. Although the findings showed good reproducibility, the L Het score could be influenced by image noise. Further studies are needed for external or cross-validation of diverse datasets with a large-scale cohort across modality, vendors, study protocols, and external validation at multiple centers. Fourth, this study was only focused on patients with hepatic fibrosis (F0-F3). Therefore, further study is needed to clarify the finding in which patients have liver fibrosis including liver cirrhosis (F4) for actual clinical settings and practices. Also, the future development of a prospective study with a larger population size which incorporates cirrhotic patients with advanced fibrosis stage (F4) may offer further insights on how this new methodology may expand the current standard of care.

Conclusions
This study developed an integrated semiautomatic software for the quantification of hepatic heterogeneity and nodularity, and the measurements of L Het and L Nod scores are reproducible in assessing fibrosis stage in CLD. The combination of quantitative L Het and L Nod scores may be more useful for differentially diagnosing the fibrosis stage in CLD using routine MR images.

Institutional Review Board Statement:
The study protocol was approved as retrospective research (WKUH-2017-03-026) by the institutional review board (IRB) of Wonkwang University Hospital.
Informed Consent Statement: Written informed consent was waived by the University Hospital IRB committee due to the use of anonymous archival data including MRI data (radiology_common data model: R_CDM, version 2.0.0) and electronic health records (observational medical outcomes partnership-CDM: OMOP-CDM, version 5.3) for the application of developed software. Data Availability Statement: All anonymized data sources described in this study are available from the corresponding author on reasonable request.