Machine-Learning Algorithm for Predicting Fatty Liver Disease in a Taiwanese Population

The rising incidence of fatty liver disease (FLD) poses a health challenge, and is expected to be the leading global cause of liver-related morbidity and mortality in the near future. Early case identification is crucial for disease intervention. A retrospective cross-sectional study was performed on 31,930 Taiwanese subjects (25,544 training and 6386 testing sets) who had received health check-ups and abdominal ultrasounds in Changhua Christian Hospital from January 2009 to January 2019. Clinical and laboratory factors were included for analysis by different machine-learning algorithms. In addition, the performance of the machine-learning algorithms was compared with that of the fatty liver index (FLI). Totally, 6658/25,544 (26.1%) and 1647/6386 (25.8%) subjects had moderate-to-severe liver disease in the training and testing sets, respectively. Five machine-learning models were examined and demonstrated exemplary performance in predicting FLD. Among these models, the xgBoost model revealed the highest area under the receiver operating characteristic (AUROC) (0.882), accuracy (0.833), F1 score (0.829), sensitivity (0.833), and specificity (0.683) compared with those of neural network, logistic regression, random forest, and support vector machine-learning models. The xgBoost, neural network, and logistic regression models had a significantly higher AUROC than that of FLI. Body mass index was the most important feature to predict FLD according to the feature ranking scores. The xgBoost model had the best overall prediction ability for diagnosing FLD in our study. Machine-learning algorithms provide considerable benefits for screening candidates with FLD.


Introduction
Nonalcoholic fatty liver disease (NAFLD) is a hepatic complication of metabolic syndrome. With the elimination of hepatitis C and effective vaccination against hepatitis B, NAFLD is becoming the most common chronic liver disease in the world, affecting 22.28%-51.04% of the Asian population [1]. Identifying potential patients at increased risk of developing NAFLD [2] is important for early medical interventions to reduce their subsequent risk of developing liver cirrhosis and hepatocellular carcinoma [1]. The golden standard diagnosis of NAFLD is liver biopsy, which is invasive and not applicable for screening purposes. Noninvasive imaging methods, such as the controlled attenuation parameter (CAP) measurement with the FibroScan [3][4][5][6] or abdominal ultrasound, are 2 of 10 able to screen patients for the presence of NAFLD and stage its severity. However, the routine use of these imaging tools may not be cost-effective and not widely available for primary care physicians. An ideal tool should allow for screening to be performed on a day-to-day basis, and, more importantly, increase the acceptance of screening by the physicians performing it. The fatty liver index (FLI) [7,8] is developed and validated with routine laboratory values for screening purposes when an ultrasound is not available. The FLI includes four parameters but is not easily calculated with simple calculation. Machine learning (ML) has recently been introduced to manage a large amount of data with the use of a computer that studies interactions between variables through the minimization of errors between the predicted and actual outcomes. Several ML techniques, such as logistic regression (LR), random forest (RF), artificial neural networks (ANNs), support vector machines, and extreme gradient boosting (xgBoost), show promise in improving predictions compared with conventional risk scoring systems. There are several previous studies that used ML methods to show a higher diagnostic value for the presence of fatty liver disease with clinical variables [9][10][11][12][13][14][15]. However, these studies utilized a limited number of datasets, and most of them did not examine with an additional testing dataset for validation. In this study, we utilized a health checkup database with a large Taiwanese population to evaluate the potential usefulness of different types of machine-learning algorithms. The performance of the machine-learning algorithms was compared with that of the well-known fatty liver index (FLI).

Patients and Data Preparation
The study population was recruited from adults (≥20 years old) who had received health examinations in the Healthcare Center of Changhua Christian Hospital between Jan 2009 and Jan 2019. The enrollment was limited to participants who had complete records of clinical and biochemical data, and results of liver ultrasonography with report for the presence of fatty liver disease. Patients with an ultrasound finding of hepatic malignancy, liver cirrhosis, ascites, or features of alcoholic liver disease were excluded. A total of 31,930 subjects who fulfilled the criteria were included in the study.
This study only accessed deidentified data retrospectively, so we waived the requirement for informed consent; the study was approved by the Institutional Review Board of Changhua Christian Hospital (approval number: CCH IRB 191012).

Diagnosis of Fatty Liver Disease (FLD)
The diagnosis of fatty liver disease requires the presence of significant hepatic steatosis confirmed by ultrasound examination. Three experienced sonographers who were unaware of the patients' clinical and laboratory data performed the hepatic ultrasonography examinations during the study period. An ultrasound finding of moderate-to-severe fatty liver was defined as the presence of fatty liver disease.

Machine-Learning Model Construction and Validation
The dataset was randomly divided into a training set and a testing dataset at a ratio of 8:2. The training was performed with 10-fold cross-validation of the training data. The performance of the developed model was evaluated on the testing dataset. All clinical and biochemical data from the participants were used to build five models to predict the presence of FLD: extreme gradient boosting (xgBoost), logistic regression (LR), neural network (NN), random forest (RF), and support vector machine (SVM) models.

Performance Metrics
The six evaluation indicators of the area under the receiver operating characteristic curve (AUROC), accuracy, recall, F1 score, precision, and specificity were evaluated to compare the performance of the five models [16]. The AUROC is a popular and strong metric in evaluating binary classifiers. The AUROCs of the fatty liver index [8] and the developed machine models were compared.

Statistical Analysis
In our experiment, the machine-learning training testing was performed with the Orange Data Mining platform [17]. Baseline data were analyzed using IBM SPSS version 28.0 (IBM Corp., Armonk, NY, USA) and medical statistical software MedCalc Version 19.8 (© 2022 MedCalc Software Ltd.). Results were considered to be statistically significant if the two-tailed p-value was <0.05 for all tests.

Characteristics of the Participant Population
A total of 31,930 subjects who fulfilled the criteria were included in the study, and 26% of the study population had moderate-to-severe fatty liver disease on abdominal ultrasound examination. Table 1 illustrates the clinical features of the fatty-and nonfattyliver populations. A total of 27 features were obtained for the database, and all had statistical differences in these two populations. The dataset was split into a training dataset and a testing dataset ( Table 2) for machine-learning model training.

Results of Different Model Performance Metrics
As there was a difference in the features of the fatty and nonfatty groups, we included all these variables in building the final model. The training of the machine-learning model was performed with Orange software (Version 3.31.1). Table 3 illustrates the performance metrics of five different machine models. The xgBoost model had the highest AUROC for predicting the presence of fatty liver disease compared with that of the four other models. The SVM model had the worst performance metrics in predicting the presence of fatty liver disease. Figure 1 illustrates the top ten features contributing to the F1 score of the developed xgBoost model.

Comparison of the Performance of Machine-Learning Models and the Fatty Liver Index
The fatty liver index (FLI) is a conventional index developed to calculate the likelihood of fatty liver disease utilizing four clinical parameters: BMI, waist, serum triglyceride, and serum gamma-glutamyl transpeptidase (rGT) levels. FLI = (e0.953 × loge (triglycerides) + 0.139 × BMI + 0.718 × loge (rGT) + 0.053 × waist circumference − 15.745)/(1 + e0.953 × loge (triglycerides) + 0.139 × BMI + 0.718 × loge (rGT) + 0.053 × waist circumference − 15.745) × 100. A comparison of the developed machine-learning models and the FLI is illustrated in Table 4.  Table 4. FLI had a statistically lower AUROC than those of the xgBoost and logistic regression models in the testing dataset, but a higher AUROC than that of the SVM model ( Figure 2). Figure 3 illustrates the comparison of the precision-recall curve of the xgBoost model and fatty liver index. The xgBoost model had a larger AUC in the precision-recall curve than that of the fatty liver index.
FLI had a statistically lower AUROC than those of the xgBoost and logistic regression models in the testing dataset, but a higher AUROC than that of the SVM model ( Figure  2). Figure 3 illustrates the comparison of the precision-recall curve of the xgBoost model and fatty liver index. The xgBoost model had a larger AUC in the precision-recall curve than that of the fatty liver index.

Discussion
Our study compared the performance of machine-learning models and the fatty liver index for the diagnosis of fatty liver disease in a hospital setting of a Taiwanese population. Fatty liver disease was observed in 26.2% of our patients. To our knowledge, only a few studies reported the use of machine learning for the diagnosis of fatty liver disease, and our study utilized a large dataset for model training and testing. The machine-learning algorithms achieved better performance than that of the conventional fatty liver index.
After hepatitis B vaccination and hepatitis C elimination [18][19][20], fatty liver disease has become the most health-threatening liver disease in the world [1,21,22]. The prevalence of fatty liver disease has seen a rapid rise in the Asian population, with the highest prevalence in Iran (64.29%) and the lowest in Taiwan (30.79%) [1]. Thus, identifying patients at risk for harboring fatty liver disease is important for subsequent lifestyle intervention to prevent liver damage progression. An abdominal ultrasound is radiation-free and noninvasive in screening the liver for the presence of fatty liver disease. A good sensitivity (85%-96%) and specificity of up to 98% could be achieved when moderate-to-severe fatty liver is detected by the abdominal ultrasound [14]. Despite its usefulness, an abdominal ultrasound cannot be available for every primary care setting or for population-based screening. In the 2022 clinical practice guideline by the American Association of Clinical Endocrinology and American Association for the Study of Liver Diseases [23],

Discussion
Our study compared the performance of machine-learning models and the fatty liver index for the diagnosis of fatty liver disease in a hospital setting of a Taiwanese population. Fatty liver disease was observed in 26.2% of our patients. To our knowledge, only a few studies reported the use of machine learning for the diagnosis of fatty liver disease, and our study utilized a large dataset for model training and testing. The machine-learning algorithms achieved better performance than that of the conventional fatty liver index.
After hepatitis B vaccination and hepatitis C elimination [18][19][20], fatty liver disease has become the most health-threatening liver disease in the world [1,21,22]. The prevalence of fatty liver disease has seen a rapid rise in the Asian population, with the highest prevalence in Iran (64.29%) and the lowest in Taiwan (30.79%) [1]. Thus, identifying patients at risk for harboring fatty liver disease is important for subsequent lifestyle intervention to prevent liver damage progression. An abdominal ultrasound is radiation-free and noninvasive in screening the liver for the presence of fatty liver disease. A good sensitivity (85%-96%) and specificity of up to 98% could be achieved when moderate-to-severe fatty liver is detected by the abdominal ultrasound [14]. Despite its usefulness, an abdominal ultrasound cannot be available for every primary care setting or for population-based screening. In the 2022 clinical practice guideline by the American Association of Clinical Endocrinology and American Association for the Study of Liver Diseases [23], screening high-risk patients (prediabetes, type 2 diabetes, obesity, and/or metabolic syndromes) with noninvasive biomarkers such as the fatty liver index is recommended, followed by referral ultrasound examination. Such two-step screening is feasible, and may reduce screening time and costs [24]. A machine-learning model could be built into the hospital electronic system or in the form of internet apps, which may further decrease the difficulty for clinical use.
Recent advances in the field of machine learning have improved the discovery of new biomarkers for disease diagnosis or helped in designing treatment plans [14,[25][26][27]. Dundar et al. [28] utilized a proposed machine-learning surgical planning and found that it significantly contributed to positive outcomes for neurosurgery. Sakatani et al. [29] utilized a machine-learning approach to estimate human cerebral atrophy on the basis of metabolic status. Shiba et al. [30] identified high risk factors for COVID-19 infection and hospitalization utilizing UK biobank data with machine-learning-based analysis. In addition, there are several previous works applying machine-learning methods for diagnosing NAFLD by utilizing electronic medical records or biochemical variables (Table 5). Even in the same ethnic population, the proportion of fatty liver disease could range from the 26.2% in our study to the 65.3% in a previously published study [15]. The rank of feature importance can, therefore, be different among different studies. Thus, the high accuracy of the developed model [15] may not be applicable to other populations. Different studies utilized different features for model training, and their performance could not be compared head-to-head. The machine-learning models should be compared with other validated tools such as the fatty liver index in the present study to show their superiority. Our study evaluated five commonly utilized machine-learning models on the basis of available clinical biochemical variables in a health checkup setting. We compared the predictive capability of seven advanced machine-learning methods and confirmed that the xgBoost model demonstrated the best performance, with the highest AUROC (0.882). High accuracy was found in previous studies utilizing machine-learning methods [14,15,31,32], and the xgBoost model achieved better performance in our study. The xgBoost model has many advantages over other machine-learning models. For example, xgBoost performs a second-order Taylor expansion on the cost function for more accurate results. This model adds a regular term to the cost function to control the complexity of the model, simplifying it and preventing overfitting with improved training speed. In addition, xgBoost is a model based on the decision tree model, and it is more explanatory than neural networks and other algorithms are [33].
Our study has the strength of having had a large dataset for model training and testing as compared to those in previous reports. The use of big data may avoid overfitting the trained model. Despite the utilization of a large dataset for model training in a previous study [33], there was no comparison of the model's performance with that of an existing validated index as in the present study. The use of the developed model may require validation before its application in clinical practice. In addition, our study included laboratory markers there are readily available in health check-ups in the majority of hospitals in Taiwan. Thus, our developed model may be more practicable in clinical practice.
Our findings were consistent with the findings of Atsawarungruangkit et al. [13], who demonstrated the superiority of a machine-learning model over the fatty liver index in predicting the presence of fatty liver disease, although the machine model utilized more features than the fatty liver index did and could not be calculated with a calculator. The calculation of the fatty liver index was also not simple and required the use of a spreadsheet or an internet app that would be similar to the use of a machine model. Utilizing a machinelearning model with better performance could assist in effectively identifying fatty liver disease in future clinical practice.
There are several limitations to the present study. First, we did not incorporate the clinical information of the patients that was not included in our database. The presence of diabetes mellitus, hepatitis B, hepatitis C, and medication used may influence the findings of the ML models in predicting the presence of fatty liver disease. Further studies are required to include this information to improve the models' performance. Second, the database did not include a history of alcohol consumption. Although we could exclude the presence of significant liver disease or alcoholic liver disease, we may have included a small proportion of patients with alcoholic fatty liver disease in our analysis. Thus, the final prediction of fatty liver disease may not be valid for other patient or ethnic populations. Third, our laboratory values did not include the level of uric acid, which is not a routine examination in our cohort for health check-ups. As uric acid level was identified as a potential marker for predicting fatty liver disease [32], the lack of this parameter may have influenced the predicting ability of our ML models. In addition, as a health check-up cohort, there were no biopsy data to confirm the extent of steatosis and the severity of liver fibrosis. Further studies including these parameters as features may further improve such ML models by providing more information on the likelihood of disease severity, and predicting patient mortality risk, and the extent of steatosis and fibrosis.

Conclusions
The present study utilized a large dataset, and the xgBoost model had the best overall prediction ability for diagnosing FLD in our population. Furthermore, machine-learning algorithms provided considerable benefits for screening candidates with FLD.  Institutional Review Board Statement: The study was performed following the principles outlined in the Helsinki Declaration, and it was approved by the institutional review board of Changhua Christian Hospital (IRB Number: 191012).

Informed Consent Statement: Not applicable.
Data Availability Statement: Data are available on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.