You are currently viewing a new version of our website. To view the old version click .
Diagnostics
  • Article
  • Open Access

4 November 2025

Machine Learning-Based Prediction of Decompensation in Hepatitis B Virus-Related Cirrhosis

,
,
,
,
,
and
1
Department of Health Services Administration, China Medical University, Taichung 406, Taiwan
2
Division of Gastroenterology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung 407, Taiwan
3
Big Data Center, China Medical University Hospital, Taichung 404, Taiwan
4
Renal Division, Department of Internal Medicine, China Medical University Hospital, Taichung 404, Taiwan
This article belongs to the Special Issue Diagnostic and Prognostic Markers in Liver Diseases

Abstract

Background/Objectives: Fatality of cirrhotic patients greatly increases when they progress to the decompensated state. Only a few studies to date have applied machine learning (ML) methods to predict decompensation in cirrhosis patients. In the present study, we attempted to apply self-developed ML models for validating their capability of predicting different complications in hepatitis B virus (HBV)-related cirrhosis patients. Methods: Data were extracted from electronic health records of 50,047 patients who were tested and diagnosed with HBV in a tertiary hospital. Four different algorithms (Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF)) were utilized, and a total of 32 ML models were trained and tested to predict variceal bleeding, ascites, jaundice, and multiple complications (≥2 complications) in HBV-related cirrhosis patients. The use of two antiviral drugs were considered: entecavir (ETV) and lamivudine (LAM). Performance of the models was assessed using area under receiver operating characteristic curve (AUROC) and accuracy score. Results: SVM and RF classifications produced the best overall predictions for decompensation in HBV-related cirrhosis patients, with AUROCs ranging from 0.85 to 0.93 and accuracy scores between 0.77 to 0.88 for ascites, jaundice, and multiple complications. The SVM and LR algorithms generated the best performance in differentiating ascites among ETV users, with AUROC of 0.93 and 0.92 and accuracy of 0.88 and 0.86, respectively. Antiviral treatment (type, length of use, adherence), and other routinely collected clinical information may serve as informative markers in differentiating decompensated cirrhosis. Conclusions: ML-based prediction of decompensation using electronic health records may assist clinicians in decision making. Findings of this study also underline the impact of antiviral therapy as a key predictor for decompensation.

1. Introduction

Chronic hepatitis B (CHB) infection remains one of the most prevalent communicable diseases worldwide, especially in Asia. In Taiwan, approximately 13.7% of the population are CHB-infected []. A significant number of individuals with CHB infection eventually advance to cirrhosis, and in severe cases, this condition can further deteriorate into life-threatening decompensation, characterized by severe complications []. These complications include jaundice, ascites, variceal bleeding, hepatorenal syndrome (HRS), hepatic encephalopathy (HE), and spontaneous bacterial peritonitis (SBP). Cirrhosis also significantly increases the risk of developing hepatocellular carcinoma (HCC), with most HCC patients having a history of cirrhosis []. Therefore, each of these conditions is a critical consequence for cirrhosis patients and requires specific treatments in disease management; however, by promptly detecting and addressing these conditions, there is a potential to significantly enhance the survival rates of these patients and alleviate their burden of disease [].
In recent years, there has been a remarkable development in the field of big data and machine learning (ML) algorithms, particularly in the area of disease prediction and identification of valuable biomarkers. This progress has had a significant impact on the study of liver diseases, as evidenced by numerous research endeavors. One notable Canadian study utilized ML algorithms to analyze multi-center patient data, resulting in the successful detection of all-cause advanced hepatic fibrosis with high classification and prediction capabilities [].
There have also been multiple studies employing ML techniques to predict fatty liver disease, yielding promising results. In a particular study involving 513 subjects, body composition and anthropometric data were utilized, and among the different ML methods employed, the random forest (RF) algorithm demonstrated the highest accuracy, achieving an impressive 82% accuracy rate []. Similarly, in a Taiwanese study using clinical and laboratory data from 31,930 individuals, a model using the extreme gradient boosting algorithm successfully diagnosed fatty liver disease with an accuracy of 0.833 and achieved an area under the receiver operating characteristic curve (AUROC) of 0.882 []. Body mass index was found to be the most influential feature in this study.
In terms of HCC recurrence, ML has been used by a number of studies to provide predictive models and inform preventative measures helpful for recurrence management [,,,]. Additionally, Farghaly et al. and Edeh et al. have integrated ML approaches such as Naïve Bayes and ensemble models in studying hepatitis C virus (HCV) infection [,].
Decompensation remains a significant mortality risk factor in cirrhosis patients, with many even experiencing multiple complications, indicating that they can have more than one of the conditions mentioned above []. However, only a few studies to date have applied ML methods on this group of patients. Using a RF model, the prediction of 1-year survival in cirrhosis patients after transjugular intrahepatic portosystemic shunt was better than that achieved by existing prognostic scores []. Similarly, ML models have been used to predict all-cause mortality in cirrhosis patients, with clinical variables and laboratory test results as key features [,,]. Other studies have focused on specific endpoints for the prediction of decompensation among cirrhosis patients, such as HRS [] and esophageal varices [,,]. More recent studies have leveraged the use of ML techniques to predict overall decompensation, defined as the presence of ascites, HE, jaundice, variceal bleeding, or SBP, with promising performance [,]. A table is provided to summarize the above-mentioned studies (Table 1).
Table 1. Analysis of the studies employing ML techniques to predict liver-related outcomes in cirrhosis patients.
In this study, we aimed to apply self-developed ML models for differentiating four different decompensation states (variceal bleeding, ascites, jaundice, and multiple complications) in patients with HBV-related cirrhosis and for identifying informative features that can enable timely intervention for these patients. To better emulate real-life scenarios, we considered both individual complications and multiple complications as our prediction endpoints. This article is structured into five main sections: Introduction, Materials and Methods, Results, Discussion, and Conclusions.
Our goal is to provide clinicians with a tool that offers more accurate and timely prediction of decompensation, which will better inform prognosis and guide management.

2. Materials and Methods

We developed and validated self-developed ML models to predict various complications in a large cohort of 50,047 hepatitis B virus (HBV)-related cirrhosis patients in a tertiary hospital setting.

2.1. Data Source

This study utilized de-identified electronic health records from hepatitis B patients with cirrhosis between 1 January 2003 and 31 December 2017 which included information on the patients’ disease history (International Classification of Diseases, Ninth and Tenth Editions, Clinical Modification; ICD-9-CM and ICD-10-CM codes), prescription record (name of medication, dose, date, and duration), laboratory test (name of test, date performed, value), and demographics (age, sex, weight, and height). The data were obtained from a 2202-bed medical center in central Taiwan and accessed in a regulated setting between 22 October 2018 and 31 May 2021 for research purposes.
Extracted laboratory test items included HBV infection markers (HBV DNA, HBsAg, anti-HBs, HBeAg, anti-HBe) for identification of the cirrhosis patients, liver biochemical tests (alanine transaminase (ALT), aspartate transaminase (AST), total bilirubin, albumin, prothrombin time, platelet count), and other related laboratory tests (creatinine, sodium, fasting glucose, alpha-fetoprotein (AFP)). Five different antiviral drugs were initially identified: lamivudine (LAM), adefovir, entecavir (ETV), telbivudine, and tenofovir (TDF). Aside from the type of antiviral drug used, the length of antiviral treatment and patient’s medication adherence were also considered. At a result, a total of 14 features including patient demographics (age and sex), liver biochemical tests, other related laboratory tests, and medication adherence (measured as length of treatment (days) and medication possession ratio) were inputted for their potential impact on the occurrence of decompensation in cirrhosis patients (Table S1). Medication possession ratio (MPR) is calculated as a ratio of the number of days the patient has supply of medication over the total number of observed days. An MPR of greater than 0.8 is indicative of highly adherent behavior and an MPR of 0 indicates no use.
All data were analyzed anonymously. This study was performed according to the Declaration of Helsinki and was approved by the institutional review board of China Medical University Hospital (CMUH107-REC2-105). Informed consent to participate was waived by the same institutional review board since only de-identified records were used, and patient confidentiality was protected by anonymizing all data before use.

2.2. Study Population

Between 2003 and 2017, a total of 118,424 HBV antigen reactive reports were identified. These reports belonged to 50,047 patients who were tested and diagnosed with HBV (HBV antigen reactive) from both outpatient and inpatient encounters in the medical center. Only patients aged 20 or over at the time of the report were included (n = 49,418). We subsequently verified their incident decompensation after cirrhosis diagnosis (diagnostic codes as seen in Table S2), and their laboratory data availability. To ensure the prediction ability of the models, we focused on individuals who consistently used a single antiviral drug rather than those with frequent changes in their treatment regimen (n = 135). We are convinced that, in normal circumstances, there should not be frequent changes to the antiviral therapy under the national health insurance scheme due to concerns of drug resistance. As TDF became eligible for reimbursement in June 2011 for HBV treatment, the number of patients using this drug continuously throughout our data collection period (2003–2017) was limited, leading to their exclusion from this study. However, ETV and LAM had been eligible for reimbursement since 2006 and 1999, respectively, resulting in a larger sample with persistent use. Patients who were prescribed with adefovir and telbivudine were also minimal.
Prediction endpoints considered were variceal bleeding, ascites, jaundice, and multiple complications as defined with the patients’ diagnostic codes (Table S2). For the purpose of this study, multiple complications would indicate at least 2 of the complications occurring concurrently. We ensured an adequate sample size (n ≥ 100) for each of the four decompensation groups (ascites, variceal bleeding, jaundice, and multiple complications) before conducting further analysis. HE, SBP, and HRS all had sample sizes of less than 100 patients and were thus excluded. Figure 1 illustrates the steps involved in our subject selection process.
Figure 1. Selection flowchart for patients identified from the electronic health records. Abbreviations: HBV—hepatitis B virus; HE—hepatic encephalopathy; SBP—spontaneous bacterial peritonitis; HRS—hepatorenal syndrome; HCC—hepatocellular carcinoma.

2.3. Data Preprocessing

Prior to utilizing the data obtained from the electronic health records, outliers were first identified by applying upper (Q3 + 1.5*IQR) and lower limits (Q1 − 1.5*IQR), and these outliers were treated as equivalent to missing values []. To handle the missing values, we used Python 3.6’s SciPy package, version 1.7.0 and imputed them based on the gamma distribution of the parameters, considering the inherent nature of their original distribution, which was non-normal. This approach allowed us to impute values that preserved the original distribution, as opposed to using mean, median, or mode values, which may lead to data points clustering around a central value and exhibit central tendency. By preserving the data’s dispersion and maintaining its variability, this method can potentially enhance model performance, especially in heterogeneous clinical datasets []. We iteratively repeated this data preprocessing process, creating various combinations of numerical data for training, and selected the model with the best performance for prediction.
Given the considerable variation in measurement scales among the parameters or features, we implemented feature scaling methods to achieve normalization. Specifically, we employed min-max scaling, which transformed the values of different features into a standardized range between 0 and 1 based on their respective minimum and maximum values. To establish the training dataset, we utilized 80% of the data, which was arbitrarily extracted from the electronic health records. This split ratio was used with consideration of the limitations of our dataset, which was not large and imbalanced [,]. The specific methods used for data normalization and splitting are detailed in our earlier publication, which provides a comprehensive guide to building the clinical decision support system [].

2.4. Feature Selection and Balancing Datasets

Using the preprocessed data, we performed feature selection to eliminate redundant or irrelevant features and optimize the model’s performance. We employed univariate feature selection via the filter method, evaluating each feature individually. Specifically, we applied the Chi-square test and Student’s t-test to determine statistically significant differences between categorical and continuous variables related to the complications, respectively [,]. Features that were statistically significant (p-value < 0.05) were used to create different groups of features to enhance the model’s performance in cross-validation.
Next, one-to-one matching between compensated and decompensated patients was conducted for each prediction endpoint. This matching process was crucial to ensure balanced datasets for training, preventing predictability from favoring the group with a larger sample size or higher probability. Additionally, we took measures to verify an equal proportion of antiviral users between the two patient groups.

2.5. Machine Learning Models

The “scikit-learn” 1.5.1 package in Python version 3.6 was utilized to implement the four ML algorithms used in this study: Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), and RF. We selected these algorithms for this study due to their widespread use and their potential effectiveness in classifying the given data. SVM is a supervised algorithm that excels at determining optimal decision boundaries between different classes. Its primary goal is to maximize the separation, or margins, between classes by identifying the support vectors that are closest to the decision boundary. SVM is versatile in handling both linear and non-linear classifications and is well-suited for high-dimensional data. In this study, the classifier utilized the radial basis function kernel to define support vectors for separating feature dimensions.
LR, a statistical modeling technique, establishes the relationship between a dependent variable (binary outcome) and independent variables (input features) using the logistic function. During training, the model adjusts its parameters to maximize the likelihood of the observed data given the predicted probabilities. Logistic regression is widely used due to its simplicity, interpretability, and effectiveness in various applications.
Another widely employed ML algorithm is DT, which partitions data into subsets based on decision rules it generates. Decision rules, or branches, are created at nodes depending on the input feature values using specific criteria. This process continues recursively until an optimal ‘tree’ is formed, with well-defined splits and decision rules that effectively segregate the classes of data.
RF combines multiple decision trees by using the bagging technique. Each decision tree is trained independently and makes predictions based on the feature values. The final prediction is determined by aggregating the predictions from all individual decision trees, either through voting (for classification) or averaging (for regression). RF generally outperforms single decision trees, exhibiting higher accuracy and a reduced risk of overfitting. Hyperparameters of the algorithms used in this study are presented in Table 2.
Table 2. Algorithm hyperparameters used in this study.
Our aim was to select the best performing model in predicting decompensation among HBV-related cirrhosis patients. For this reason, these four ML algorithms were chosen in this study for their distinct classification approaches. For more in-depth information on the algorithms used, please refer to our previously published research [].

2.6. Training

Ten-fold cross-validation was used in the training process to evaluate the model’s accuracy. Given our limited data, we initially trained the models using all features and explored various combinations without applying feature selection. This trial-and-error approach allowed us to experiment with different feature sets. However, the results were not consistently strong across all models, which led us to ultimately pursue feature selection as a more effective strategy. For transparency, we have included these preliminary results in the Supplementary Materials for reference (Table S3). After each fold, an accuracy score was calculated. The models with the highest average accuracy after 10-fold cross-validation were advanced for validation.

2.7. Validation

Performance evaluation metrics used to compare the validated ML models included AUROC and accuracy. Accuracy measures the proportion of true positive and true negative predictions, while AUROC plots sensitivity against 1-specificity, making it a key evaluation metric for ML models. Higher value of all these metrics would imply better model performance. An AUROC value between 0.7 and 0.8 is deemed acceptable based on recent related studies and clinical applications [] and any level above would indicate a good (0.8–0.9) or even exceptional performance (0.9–1.0).

3. Results

We tested a total of 32 ML models, with 4 models dedicated to each of the 4 decompensation endpoints and each endpoint was examined for 2 separate groups of antiviral users (LAM and ETV). After balancing datasets, we have obtained patient populations used to train and test the algorithms specific to each prediction endpoint and antiviral therapy. Table 3 lists the sizes of patient samples used for each prediction outcome in the training and validation sets (a completely separate, unused 20% of data to test the model’s ability to generalize to new data). Tables S4–S7 provide the characteristic profiles of patients included in the classification models.
Table 3. Sample sizes used in the training and validation sets.

3.1. Variceal Bleeding

For LAM users, data from 652 patients (326 positive, 326 negative) were used to train and 164 to validate (82 positive, 82 negative). Mean age of patients was 52 years and male predominance was observed (76%). Eight features selected include AST, total bilirubin, albumin, fasting glucose, creatinine, prothrombin time, platelet, and medication possession ratio (Table S8). Highest AUROC and accuracy were achieved in the SVM model, with scores of 0.71 and 0.70, respectively (Table 4). The LR model also performed well with an AUROC of 0.71 and accuracy of 0.68.
Table 4. Comparison of model performance for different medication groups after validation.
For ETV users, data from 748 patients (374 positive, 374 negative) were used to train and 186 to validate. Demographics profile was similar to that of LAM users (i.e., male predominance and mean age of 52 years). Eleven features selected include sex, AST, total bilirubin, albumin, fasting sugar, creatinine, prothrombin time, platelet, AFP, length of treatment, and medication possession ratio. After testing the trained models, we observed that the four different classifiers showed an acceptable overall performance, with AUROC ranging from 0.63 to 0.79. Similar to LAM users, SVM model exhibited the best prediction ability (AUROC: 0.79, accuracy: 0.72).

3.2. Ascites

Mean age of patients with ascites was between 55 and 56 years, while majority (64 to 65%) of them were males. For LAM users, data from 178 patients were used to train and 44 to validate. Ten features selected are shown in Table S8. RF model had the best AUROC of 0.76 (accuracy: 0.70) among the four models. For ETV users, data from 222 patients were used to train and 56 to validate. Ten features were also selected. Unlike LAM users, LR showed the best performance (AUROC: 0.93, accuracy: 0.88), closely followed by the SVM model (AUROC: 0.92, accuracy: 0.86).

3.3. Jaundice

Eighty percent of jaundice patients under the LAM regimen were male, with a mean age of 53 years. Data from 120 patients were used to train and 30 to validate. The RF model achieved an exceptional performance (AUROC: 0.91, accuracy: 0.87), followed by the SVM model (AUROC: 0.87, accuracy: 0.77).
The 9 features selected for LAM and ETV models were identical: ALT, AST, total bilirubin, albumin, fasting glucose, prothrombin time, sodium, length of treatment, and medication possession ratio (Table S8). Under the ETV treatment, 79% of jaundice patients were male and their mean age was 52 years. Data from 172 patients were used to train and 44 to validate. Among the 4 algorithms, the RF model exhibited the highest AUROC and accuracy scores of 0.81 and 0.73, respectively.

3.4. Multiple Complications

For cirrhosis patients who developed multiple complications, SVM was able to predict the endpoint with an AUROC of 0.85 and accuracy of 0.77 under the ETV regimen using 13 selected features (Table S8). For patients under the LAM therapy, our best model (LR) was only able to achieve an AUROC of 0.74 and accuracy score of 0.68 with 12 features. SVM model also showed an AUROC of 0.73 and accuracy of 0.71. Demographics of patients with multiple complications were similar to that of the other complication groups.
Overall performance and the ranking of the models is summarized in Table 5. Under LAM therapy, RF algorithms performed the best in predicting jaundice and ascites. Under ETV regimen, LR algorithm performed the best in predicting ascites, while SVM algorithm was best in predicting variceal bleeding and multiple complications.
Table 5. Ranking of model output for each decompensation endpoint by antiviral drug.

4. Discussion

This was one of the few current studies employing ML methods to predict decompensation, as well as multiple complications in HBV-related cirrhosis patients. Using 15-year electronic health record data, we tested four different ML approaches and demonstrated that SVM and RF classifications produced the best overall predictions for decompensation in HBV-related cirrhosis patients, with AUROC of 0.85 to 0.93 and accuracy of 0.77 to 0.88 for ascites, jaundice, and multiple complications. This result closely resembles the performance that was previously achieved in ML models from past studies examining similar endpoints. Using a RF classification, Dong et al. effectively identified at-risk cirrhosis patient for esophageal varices, with an AUROC of 0.84 and 0.82 in the training and validation sets, respectively []. In another recent study on predicting prior decompensation, a RF model was able to achieve AUROCs of 0.95 and 0.87 on training and test data, respectively []. Other studies demonstrated comparable performance in predicting non-specific liver disease and liver disease-related mortality using ML and statistical methods, with reported accuracies of over 0.8 and the best AUROCs ranging from 0.8 to 0.9 [,,].
Conversely, some existing studies have been able to predict specific outcomes with high accuracy in patients with liver cirrhosis using neural network models. Using an Artificial Neural Network model, with key input data from clinical and biochemical parameters, the model achieved a high AUC of 0.959 in predicting variceal bleeding []. Another study predicted broader, longer-term outcomes of decompensation and liver-related death using a Convolutional Neural Network model integrating electrocardiogram data, resulting in a high AUC of 0.933 []. The adoption of deep learning models in these studies likely contributed to the high model performance by better handling non-linear relationships within the data and finding the “hidden” signals in the complex clinical data.
Extreme Gradient Boosting (XGBoost) algorithms have also been used in predicting decompensation with comparable results. A study predicting HRS showed high predictive performance, with AUCs of 0.832 in the training set and 0.8415 in the validation set []. Another study predicting variceal bleeding using the same algorithm resulted in high accuracies of 93.7% in the internal validation set and 85.7% in the external validation set []. These consistently high-performance metrics across different outcomes highlight XGBoost as a potentially robust and generalizable tool for risk stratification in advanced liver disease.
In comparison, the SVM and RF models in our study achieved AUROCs of 0.85 and 0.91, respectively, and accuracy scores of 0.77 and 0.87, which may adequately meet physicians’ expectations for clinical decision support. Generally, SVM and RF outperform LR and DT, potentially due to their unique classification properties and advantages. RF generally outperforms DT because it is an ensemble learning technique that aggregates a large number of decision trees, which reduces variance compared to single decision trees on the same dataset [,,]. When comparing the performance of LR and RF, RF has also been shown to demonstrate better accuracy, especially as the number of features and dimensionality increase []. SVM is frequently utilized in disease prediction, with SVM and RF often demonstrating the highest accuracies [,].
Among all tested ML models, however, the LR algorithm differentiating ascites in ETV users had the best performance (AUROC: 0.93 (95% CI: 0.86–0.99), accuracy: 0.88, which is closely followed by the SVM model that achieved an AUROC of 0.92 and accuracy of 0.86. We confirmed that the commonly tested clinical and demographic markers in cirrhosis patients, including ALT, total bilirubin, albumin, platelet count, and age, are strong risk factors for decompensation. This is similar to the findings of previous studies where they identified that ALT, platelet count, total bilirubin, albumin, prothrombin time were among the significant independent predictors of hepatic decompensation [,]. Several ML studies have also identified serum albumin, ALT, platelet count, and hemoglobin as significant risk factors for decompensated cirrhosis [,,]. With this result, we are optimistic that ML algorithms may be valuable tools for facilitating timely prevention and management of ascites secondary to HBV-related cirrhosis.
Our second-best performing model was the SVM model in predicting multiple complications in patients under the ETV regimen. We hypothesize that the need for more features to predict multiple complications could be explained in part by the noise in the patient characteristics. The prediction results may be influenced by various underlying factors associated with the presence of multiple complications, which could potentially impact accuracy. Nevertheless, this is one of the first studies to consider multiple complications as the outcome of interest. It is important to note that future studies will be required to validate the outcomes of our model.

4.1. Theoretical and Practical Implications

Given our study results, ML-based prediction models may serve as a valuable aid in the clinical decision-making process for predicting decompensation. After examining our ML models, we found that AST, total bilirubin, albumin, and prothrombin time were the most frequently selected features, appearing in all complications for both medication groups. Unsurprisingly, these four biomarkers are common liver biochemical tests that indicate the level of liver tissue damage and severity. This study thus validates the clinical relevance of established liver damage markers by demonstrating their consistent importance across various complications, regardless of the specific antiviral used. Routine monitoring of these markers is thus highly recommended when assessing the risk of decompensation in practice.
Medication also emerged as an important predictor of almost all decompensation outcomes we examined. Not only the specific type of antiviral drug, but also the length of treatment and the medication possession ratio of the prescribed regimen were identified as critical factors in the occurrence of decompensation. This suggests that antiviral management itself—including adherence and regimen choice—is an independent and significant determinant of the clinical course in HBV-related cirrhosis patients. ETV has shown superior virologic and biochemical efficacy compared to LAM, while long-term use of LAM is associated with a higher risk of developing viral resistance, reducing its ability to suppress virus replication [,]. Our results also highlight that both the proper length of antiviral treatment prescribed by clinicians and patient adherence to those orders are significant factors in preventing decompensation. Current available evidence strongly supports the notion that maintaining sustained adherence to antiviral treatment in patients with HBV and HBV-related cirrhosis plays a protective role in mitigating the risk of decompensation and mortality [,]. It emphasizes the need for aggressive patient adherence programs and minimizing treatment interruptions. We thus believe this is one of the strengths of our work as this information was not considered in previous studies with ML approaches, which has been extensively examined in epidemiological studies.

4.2. Limitations of Study

Nevertheless, this study is not without caveats. First of all, our data were limited to one tertiary hospital which made our patient sample relatively homogeneous. For certain complications such as HE and SBP, the available sample sizes were insufficient to adequately train and validate the models. Secondly, we did not consider cirrhosis outside of HBV etiology as it was our primary intention to focus on this specific group of patients. This decision was made deliberately to maintain the context and scope of our research. Furthermore, we were unable to extract an adequate patient sample for individuals under the TDF regimen to include in this investigation, since it was available much later than ETV and LAM. As a result, the applicability of the features and developed models for TDF usage, as well as for individuals using multiple (interchanging) antiviral drugs, requires further examination in future studies. Our study results can serve as a guiding example for future classification models, highlighting antiviral treatment and medication adherence as probable predictors of decompensation. Additionally, we did not validate our different models on a completely independent new dataset from an external source. This is due to data privacy and confidentiality concerns, which prevented the exchange and access of information across different healthcare providers. Nevertheless, many previous studies using ML algorithms to predict disease outcomes have followed a similar approach of dividing a dataset into two parts, one for training and one for testing [,]. Finally, there are limitations of univariate feature selection, including its inability to account for interactions between features and oversimplifying the ranking process by evaluating features individually. This approach might overlook potential combined effects, leading to reduced model performance due to the exclusion of significant interactions [,].

5. Conclusions

In conclusion, we have validated self-developed ML models from electronic health records that may well predict the risk of decompensation among HBV-related cirrhosis patients. Routinely collected clinical information and treatment regimen may serve as informative features in differentiating potential complications in this patient population. Findings of this study also underline the prospect for adopting machine-learning based methods in assessing the risk of liver-related outcomes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/diagnostics15212790/s1, Table S1: List of inputted features. Table S2: ICD-9-CM and ICD-10-CM codes used to identify cirrhosis and complications in HBV-related cirrhosis patients. Table S3: Performance metrics of the machine learning models. Table S4: Characteristic profile of balanced patient samples used in predicting variceal bleeding. Table S5: Characteristic profile of balanced patient samples used in predicting ascites. Table S6: Characteristic profile of balanced patient samples used in predicting jaundice. Table S7: Characteristic profile of balanced patient samples used in predicting multiple complications. Table S8: Selected features for each model.

Author Contributions

Conceptualization, H.-C.L., M.-L.H. and V.C.-R.H.; formal analysis, H.-C.L., M.-Y.L. and V.C.-R.H.; data curation, M.-Y.L., C.-C.K. and V.C.-R.H.; writing—original draft preparation, H.-C.L., M.-L.H., S.-H.S., M.-S.H. and V.C.-R.H.; writing—review and editing, H.-C.L., M.-L.H. and V.C.-R.H.; funding acquisition, H.-C.L. and V.C.-R.H. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Science and Technology Council (NSTC) Taiwan, grant IDs: MOST 106-2410-H-039-008 and MOST 107-2314-B-039-065-MY3 awarded to VCRH, and grants MOST 109-2121-M-039-001, MOST 110-2121-M-039-001, and MOST 111-2121-C-039-002 awarded to HCL. Additional support was provided by China Medical University, grant IDs: CMU113-S-54 awarded to VCRH, and CMU110-MF-111 and CMU111-MF-116 awarded to HCL.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of China Medical University Hospital (CMUH107-REC2-105, 9 July 2018).

Data Availability Statement

The raw data used in this study remains confidential and will not be shared due to China Medical University Hospital policy.

Acknowledgments

We are grateful to the Big Data Center at China Medical University Hospital for providing data, tools, platforms, and professional support. All data were anonymized and not identifiable.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AFPalpha-fetoprotein
ALTalanine aminotransferase
ANNArtificial Neural Network
Anti-HBe anti-hepatitis B e-antigen
Anti-HBs anti-hepatitis B surface antibody
ASTaspartate transferase
AUROCarea under the receiver operating characteristic curve
CHBchronic hepatitis B
CIconfidence interval
CNNConvolutional Neural Network
DNNDeep Neural Network
DTDecision Tree
ETVentecavir
HBeAghepatitis B e antigen
HBsAghepatitis B surface antigen
HBVhepatitis B virus
HCChepatocellular carcinoma
HCVhepatitis C virus
HEhepatic encephalopathy
HRShepatorenal syndrome
ICD-9International Classification of Diseases, Ninth Revision Clinical Modification
ICD-10International Classification of Diseases, Tenth Revision Clinical Modification
LAMlamivudine
LASSOLeast Absolute Shrinkage and Selection Operator
LRLogistic Regression
MLmachine learning
PTprothrombin time
RFRandom Forest
SBPspontaneous bacterial peritonitis
SVMSupport Vector Machine
TDFtenofovir
XGBoostExtreme Gradient Boosting

References

  1. Chen, C.-L.; Yang, J.-Y.; Lin, S.-F.; Sun, C.-A.; Bai, C.-H.; You, S.-L.; Chen, C.-J.; Kao, J.-H.; Chen, P.-J.; Chen, D.-S. Slow decline of hepatitis B burden in general population: Results from a population-based survey and longitudinal follow-up study in Taiwan. J. Hepatol. 2015, 63, 354–363. [Google Scholar] [CrossRef]
  2. Fattovich, G.; Bortolotti, F.; Donato, F. Natural history of chronic hepatitis B: Special emphasis on disease progression and prognostic factors. J. Hepatol. 2008, 48, 335–352. [Google Scholar] [CrossRef]
  3. Fattovich, G.; Stroffolini, T.; Zagni, I.; Donato, F. Hepatocellular carcinoma in cirrhosis: Incidence and risk factors. Gastroenterology 2004, 127 (Suppl. 1), S35–S50. [Google Scholar] [CrossRef]
  4. Peng, C.Y.; Chien, R.N.; Liaw, Y.F. Hepatitis B virus-related decompensated liver cirrhosis: Benefits of antiviral therapy. J. Hepatol. 2012, 57, 442–450. [Google Scholar] [CrossRef] [PubMed]
  5. Sarvestany, S.S.; Kwong, J.C.; Azhie, A.; Dong, V.; Cerocchi, O.; Ali, A.F.; Karnam, R.S.; Kuriry, H.; Shengir, M.; Candido, E.; et al. Development and validation of an ensemble machine learning framework for detection of all-cause advanced hepatic fibrosis: A retrospective cohort study. Lancet Digit. Health 2022, 4, e188–e199. [Google Scholar] [CrossRef]
  6. Razmpour, F.; Daryabeygi-Khotbehsara, R.; Soleimani, D.; Asgharnezhad, H.; Shamsi, A.; Bajestani, G.S.; Nematy, M.; Pour, M.R.; Maddison, R.; Islam, S.M.S. Application of machine learning in predicting non-alcoholic fatty liver disease using anthropometric and body composition indices. Sci. Rep. 2023, 13, 4942. [Google Scholar] [CrossRef]
  7. Chen, Y.-Y.; Lin, C.-Y.; Yen, H.-H.; Su, P.-Y.; Zeng, Y.-H.; Huang, S.-P.; Liu, I.-L. Machine-learning algorithm for predicting fatty liver disease in a Taiwanese population. J. Pers. Med. 2022, 12, 1026. [Google Scholar] [CrossRef] [PubMed]
  8. Wong, G.L.-H.; Hui, V.W.-K.; Tan, Q.; Xu, J.; Lee, H.W.; Yip, T.C.-F.; Yang, B.; Tse, Y.-K.; Yin, C.; Lyu, F.; et al. Novel machine learning models outperform risk scores in predicting hepatocellular carcinoma in patients with chronic viral hepatitis. JHEP Rep. 2022, 4, 100441. [Google Scholar] [CrossRef]
  9. Kim, H.Y.; Lampertico, P.; Nam, J.Y.; Lee, H.C.; Kim, S.U.; Sinn, D.H.; Seo, Y.S.; Lee, H.A.; Park, S.Y.; Lim, Y.S.; et al. An artificial intelligence model to predict hepatocellular carcinoma risk in Korean and Caucasian patients with chronic hepatitis B. J. Hepatol. 2022, 76, 311–318. [Google Scholar] [CrossRef] [PubMed]
  10. Liao, H.; Xiong, T.; Peng, J.; Xu, L.; Liao, M.; Zhang, Z.; Wu, Z.; Yuan, K.; Zeng, Y. Classification and prognosis prediction from histopathological images of hepatocellular carcinoma by a fully automated pipeline based on machine learning. Ann. Surg. Oncol. 2020, 27, 2359–2369. [Google Scholar] [CrossRef]
  11. Saito, A.; Toyoda, H.; Kobayashi, M.; Koiwa, Y.; Fujii, H.; Fujita, K.; Maeda, A.; Kaneoka, Y.; Hazama, S.; Nagano, H.; et al. Prediction of early recurrence of hepatocellular carcinoma after resection using digital pathology images assessed by machine learning. Mod. Pathol. 2021, 34, 417–425. [Google Scholar] [CrossRef]
  12. Farghaly, H.M.; Shams, M.Y.; Abd El-Hafeez, T. Hepatitis C Virus prediction based on machine learning framework: A real-world case study in Egypt. Knowl. Inf. Syst. 2023, 65, 2595–2617. [Google Scholar] [CrossRef]
  13. Edeh, M.O.; Dalal, S.; Dhaou, I.B.; Agubosim, C.C.; Umoke, C.C.; Richard-Nnabu, N.E.; Dahiya, N. Artificial intelligence-based ensemble learning model for prediction of hepatitis C disease. Front. Public Health 2022, 10, 892371. [Google Scholar] [CrossRef]
  14. Hirode, G.; Saab, S.; Wong, R.J. Trends in the burden of chronic liver disease among hospitalized US adults. JAMA Netw. Open 2020, 3, e201997. [Google Scholar] [CrossRef] [PubMed]
  15. Da, B.; Chen, H.; Wu, W.; Guo, W.; Zhou, A.; Yin, Q.; Gao, J.; Chen, J.; Xiao, J.; Wang, L.; et al. Development and validation of a machine learning-based model to predict survival in patients with cirrhosis after transjugular intrahepatic portosystemic shunt. EClinicalMedicine 2024, 79, 103001. [Google Scholar] [CrossRef] [PubMed]
  16. Guo, A.; Mazumder, N.R.; Ladner, D.P.; Foraker, R.E. Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning. PLoS ONE 2021, 16, e0256428. [Google Scholar] [CrossRef]
  17. Kanwal, F.; Taylor, T.J.; Kramer, J.R.; Cao, Y.; Smith, D.; Gifford, A.L.; El-Serag, H.B.; Naik, A.D.; Asch, S.M. Development, validation, and evaluation of a simple machine learning model to predict cirrhosis mortality. JAMA Netw. Open 2020, 3, e2023780. [Google Scholar] [CrossRef]
  18. Guo, C.; Liu, Z.; Fan, H.; Wang, H.; Zhang, X.; Zhao, S.; Li, Y.; Han, X.; Wang, T.; Chen, X.; et al. Machine-learning-based plasma metabolomic profiles for predicting long-term complications of cirrhosis. Hepatology 2025, 81, 168–180. [Google Scholar] [CrossRef]
  19. Yao, F.; Luo, J.; Zhou, Q.; Wang, L.; He, Z. Development and validation of a machine learning-based prediction model for hepatorenal syndrome in liver cirrhosis patients using MIMIC-IV and eICU databases. Sci. Rep. 2025, 15, 2743. [Google Scholar] [CrossRef] [PubMed]
  20. Hou, Y.; Yu, H.; Zhang, Q.; Yang, Y.; Liu, X.; Wang, X.; Jiang, Y. Machine learning-based model for predicting the esophagogastric variceal bleeding risk in liver cirrhosis patients. Diagn. Pathol. 2023, 18, 29. [Google Scholar] [CrossRef]
  21. Agarwal, S.; Sharma, S.; Kumar, M.; Venishetty, S.; Bhardwaj, A.; Kaushal, K.; Gopi, S.; Mohta, S.; Gunjan, D.; Saraya, A.; et al. Development of a machine learning model to predict bleed in esophageal varices in compensated advanced chronic liver disease: A proof of concept. J. Gastroenterol. Hepatol. 2021, 36, 2935–2942. [Google Scholar] [CrossRef] [PubMed]
  22. Dong, T.S.; Kalani, A.; Aby, E.S.; Le, L.; Luu, K.; Hauer, M.; Kamath, R.; Lindor, K.D.; Tabibian, J.H. Machine learning-based development and validation of a scoring system for screening high-risk esophageal varices. Clin. Gastroenterol. Hepatol. 2019, 17, 1894–1901.e1. [Google Scholar] [CrossRef]
  23. Müller, S.E.; Casper, M.; Ripoll, C.; Zipprich, A.; Horn, P.; Krawczyk, M.; Lammert, F.; Reichert, M.C. Machine learning models predicting decompensation in cirrhosis. J. Gastrointest. Liver Dis. 2025, 34, 71–80. [Google Scholar] [CrossRef]
  24. Ahn, J.C.; Rattan, P.; Starlinger, P.; Juanola, A.; Moreta, M.J.; Colmenero, J.; Aqel, B.; Keaveny, A.P.; Mullan, A.F.; Liu, K.; et al. AI-Cirrhosis-ECG (ACE) score for predicting decompensation and liver outcomes. JHEP Rep. 2025, 7, 101356. [Google Scholar] [CrossRef]
  25. Ch Sanjeev Kumar, D.; Ajit Kumar, B.; Satchidananda, D.; Ashish, G. An outliers detection and elimination framework in classification task of data mining. Decis. Anal. J. 2023, 6, 100164. [Google Scholar] [CrossRef]
  26. Niederhut, D. Safe handling instructions for missing data. In Proceedings of the 17th Python in Science Conference, Austin, TX, USA, 9–15 July 2018; Available online: https://proceedings.scipy.org/articles/Majora-4af1f417-008.pdf (accessed on 29 October 2025).
  27. Gholamy, A.; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation; Departmental Technical Reports (CS) 1209; University of Texas at El Paso: El Paso, TX, USA, 2018. [Google Scholar]
  28. Park, D.J.; Park, M.W.; Lee, H.; Kim, Y.J.; Kim, Y.; Park, Y.H. Development of machine learning model for diagnostic disease prediction based on laboratory tests. Sci. Rep. 2021, 11, 7567. [Google Scholar] [CrossRef]
  29. Hsieh, V.C.; Liu, M.Y.; Lin, H.C. AI-enabled clinical decision support system modeling for the prediction of cirrhosis complications. IRBM 2024, 45, 100854. [Google Scholar] [CrossRef]
  30. Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A review of feature selection methods for machine learning-based disease risk prediction. Front. Bioinform. 2022, 2, 927312. [Google Scholar] [CrossRef]
  31. Amin, R.; Yasmin, R.; Ruhi, S.; Rahman, M.H.; Reza, M.S. Prediction of chronic liver disease patients using integrated projection based statistical feature extraction with machine learning algorithms. Inform. Med. Unlocked 2023, 36, 101155. [Google Scholar] [CrossRef]
  32. Dritsas, E.; Trigka, M. Supervised machine learning models for liver disease risk prediction. Computers 2023, 12, 19. [Google Scholar] [CrossRef]
  33. Lin, Y.J.; Chen, R.J.; Tang, J.H.; Yu, C.S.; Wu, J.L.; Chen, L.C.; Chang, S.S. Machine-learning monitoring system for predicting mortality among patients with noncancer end-stage liver disease: Retrospective study. JMIR Med. Inform. 2020, 8, e24305. [Google Scholar] [CrossRef]
  34. Uddin, S.; Khan, A.; Ekramul Hossain, M.; Ali Moni, M. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 2019, 19, 281. [Google Scholar] [CrossRef]
  35. Khalilia, M.; Chakraborty, S.; Popescu, M. Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 2011, 11, 51. [Google Scholar] [CrossRef]
  36. Lindner, C. Chapter 1—Automated Image Interpretation Using Statistical Shape Models. In Statistical Shape and Deformation Analysis: Methods, Implementation and Applications, 1st ed.; Zheng, G., Li, S., Szekely, G., Eds.; Elsevier: Amsterdam, The Netherlands, 2017; pp. 3–32. [Google Scholar]
  37. Couronné, R.; Probst, P.; Boulesteix, A. Random forest versus logistic regression: A large-scale benchmark experiment. BMC Bioinform. 2018, 19, 270. [Google Scholar] [CrossRef]
  38. Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef] [PubMed]
  39. Lee, H.W.; Yip, T.C.-F.; Tse, Y.-K.; Wong, G.L.-H.; Kim, B.K.; Kim, S.U.; Park, J.Y.; Kim, D.Y.; Chan, H.L.-Y.; Ahn, S.H.; et al. Hepatic decompensation in cirrhotic patients receiving antiviral therapy for chronic hepatitis B. Clin. Gastroenterol. Hepatol. 2021, 19, 1950–1958.e7. [Google Scholar] [CrossRef] [PubMed]
  40. Lai, C.L.; Rosmawati, M.; Lao, J.; Van Vlierberghe, H.; Anderson, F.H.; Thomas, N.; Dehertogh, D. Entecavir is superior to lamivudine in reducing hepatitis B virus DNA in patients with chronic hepatitis B infection. Gastroenterology 2002, 123, 1831–1838. [Google Scholar] [CrossRef]
  41. Chang, T.-T.; Gish, R.G.; de Man, R.; Gadano, A.; Sollano, J.; Chao, Y.-C.; Lok, A.S.; Han, K.-H.; Goodman, Z.; Zhu, J.; et al. A comparison of entecavir and lamivudine for HBeAg-positive chronic hepatitis B. N. Engl. J. Med. 2006, 354, 1001–1010. [Google Scholar] [CrossRef]
  42. Allard, N.L.; MacLachlan, J.H.; Dev, A.; Dwyer, J.; Srivatsa, G.; Spelman, T.; Thompson, A.J.; Cowie, B.C. Adherence in chronic hepatitis B: Associations between medication possession ratio and adverse viral outcomes. BMC Gastroenterol. 2020, 20, 140. [Google Scholar] [CrossRef] [PubMed]
  43. Fu, K.Y.; Hsieh, M.L.; Chen, J.A.; Hsieh, V.C.R. Association between medication adherence and disease outcomes in patients with hepatitis B-related cirrhosis: A population-based case-control study. BMJ Open 2022, 12, e059856. [Google Scholar] [CrossRef]
  44. Ma, Y.; Lu, Q.; Yuan, F.; Chen, H. Comparison of the effectiveness of different machine learning algorithms in predicting new fractures after PKP for osteoporotic vertebral compression fractures. J. Orthop. Surg. Res. 2023, 18, 62. [Google Scholar] [CrossRef] [PubMed]
  45. Liu, Y.; Ye, S.; Xiao, X.; Sun, C.; Wang, G.; Wang, G.; Zhang, B. Machine learning for tuning, selection, and ensemble of multiple risk scores for predicting type 2 diabetes. Risk Manag. Health Policy 2019, 12, 189–198. [Google Scholar] [CrossRef] [PubMed]
  46. Abellana, D.P.M.; Lao, D.M. A new univariate feature selection algorithm based on the best–worst multi-attribute decision-making method. Decis. Anal. J. 2023, 7, 100240. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.