Machine Learning Applied to Omics Datasets Predicts Mortality in Patients with Alcoholic Hepatitis

Alcoholic hepatitis is a major health care burden in the United States due to significant morbidity and mortality. Early identification of patients with alcoholic hepatitis at greatest risk of death is extremely important for proper treatments and interventions to be instituted. In this study, we used gradient boosting, random forest, support vector machine and logistic regression analysis of laboratory parameters, fecal bacterial microbiota, fecal mycobiota, fecal virome, serum metabolome and serum lipidome to predict mortality in patients with alcoholic hepatitis. Gradient boosting achieved the highest AUC of 0.87 for both 30-day mortality prediction using the bacteria and metabolic pathways dataset and 90-day mortality prediction using the fungi dataset, which showed better performance than the currently used model for end-stage liver disease (MELD) score.


Introduction
Alcohol use disorder is a major healthcare burden. Common consequences of heavy alcohol consumption include a wide spectrum of liver diseases, such as alcohol-associated steatosis, fibrosis and cirrhosis [1]. Alcoholic hepatitis represents the most severe manifestation of alcohol-related liver disease, with an annual incidence rate of 34 per million in women and 46 per million in men [2]. As a life-threatening disease, alcoholic hepatitis is associated with a mortality rate of 15%, 24%, and 56% for 28-day, 84-day and 5-year mortality, respectively [2]. Severe alcoholic hepatitis is associated with a very high 90-day mortality of up to 75% [3]. Therefore, it is crucial to accurately determine the prognosis of patients presenting with acute alcoholic hepatitis. Early identification of alcoholic hepatitis patients at greatest risk of death is extremely important for the stratification of patients towards proper treatments, such as corticosteroids, liver transplantation or clinical trials.
Alcohol-associated liver disease is transmissible via fecal microbiota transfer in mice [4], and a small clinical trial showed survival benefits in patients with severe alcoholic hepatitis receiving daily fecal microbiota transplantation for 7 days from a healthy donor [5]. Thus, the intestinal microbiota is very important for development and disease outcome in patients with alcoholic hepatitis. The bacterial microbiota, fungal mycobiota and virome are involved in pathogenesis of alcoholic hepatitis [6][7][8][9]. Alcoholic hepatitis is also accompanied by a profound dysfunction of the intestinal barrier leading to bacterial translocation to the liver and worse disease outcome [10]. Common serum biomarkers used to evaluate gut barrier dysfunction include anti-Saccharomyces cerevisiae antibodies (ASCA), zonulin and lipopolysaccharide binding protein (LBP). ASCA are systemic antibodies against fungal antigens [11]. Serum zonulin is a surrogate marker for intestinal permeability [12,13]. LBP is synthesized in response to translocated LPS and serves as an additional biomarker for gut barrier dysfunction [14].
As a subset of artificial intelligence, machine learning is an umbrella term for a variety of important computational tools for early diagnosis and prognosis, which includes different classification models, such as gradient boosting, random forest, and support vector machine. Machine learning generates predictive models effectively through the detection of hidden patterns within big datasets. Given that a lot of variables could affect the clinical outcome, it is often difficult for a physician to predict a given outcome to ascertain. Machine learning algorithms could better incorporate various risk factors to identify nuanced interactions between outcomes and variables, which allows them to find new patterns between risk factors. Predicting clinical outcomes using a profiling dataset with a large number of variables has drawn great interest over the past years. For instance, clinical data and microbiota based multi-omics have been used to predict outcome or severity of diseases such as nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH) [15,16].
In the present study, we demonstrate the use of machine learning tools to predict 30-day and 90-day mortality in patients with alcoholic hepatitis using clinical data. In particular, we compare four popular models: gradient boosting, random forest, support vector machine, and logistic regression models. Our second aim is to identify key features associated with high mortality from multi-omics with a particular focus on datasets derived from a global characterization of the gut microbiota.

Mortality Prediction with Clinical Data in Patients with Alcoholic Hepatitis
A total of 210 patients with alcoholic hepatitis were included in this study (Table 1). Of these, 31 (14.8%) patients died within 30 days. Among 179 patients alive at day 30, 23 patients (12.8%) died within 90 days, 104 patients were alive at 90 days, while the remaining 52 patients were lost to follow-up ( Figure 1A,B). The model of end-stage liver disease (MELD) score is currently used in clinical practice to predict mortality in alcoholic hepatitis patients. In our dataset, the area under the receiver operating characteristic curve (AUC) was 0.78 and 0.82 for the logistic regression model when predicting 30-day and 90-day mortality, respectively, using MELD score ( Figure 1C).
To assist physicians to make clinical decisions more precisely, we developed four models using 11 routine clinical laboratory variables to predict mortality in patients with alcoholic hepatitis (Table 2). From now on, we refer to these 11 clinical laboratory variables as Clinical data. These 11 variables were selected based on the availability of clinical data collected and the missing-value rate. Only clinical parameters with a missing-value rate less than 20% were selected. When predicting 30-day mortality, the AUC achieved 0.74-0.81 using four different models: gradient boosting, logistic regression, random forest or support vector machine ( Figure 1D). The AUC for 90-day mortality prediction achieved 0.79-0.80 using these models ( Figure 1E). Among these four models, gradient boosting attained the highest AUC for 30-day mortality prediction, and the other three models attained the highest AUC for 90-day mortality prediction. In particular, the prediction of 30-day mortality in patients with alcoholic hepatitis from the random forest and gradient boosting was slightly better than the currently used MELD score in clinical practice (Table 3).

Selected Variables from Multi-Omics Datasets
To further improve the performance of mortality prediction for patients with alcoholic hepatitis, we collected multi-omics data, including fecal bacterial microbiome, fecal fungal mycobiome, fecal virome, serum metabolome and lipidome ( Figure 2). Due to limited sample availability, multi-omics data were collected from only a subset of the patient cohort, and multiple imputation was applied to the multi-omics data to preserve all samples having missing values. After multiple imputation, we used random forest to select variables with the top 11 average feature importance in each multi-omics data, so the number of variables included in the models using the multi-omics data and using the clinical data was the same. The 11 selected variables are listed in Table 2. When implementing the random forest, we forced the component variables used to calculate the MELD score (creatinine, bilirubin, international normalized ratio, and sodium) to be selected for splitting at each node in the trees. Then, using these 11 selected features for each multi-omics dataset, we built gradient boosting, logistic regression, random forest or support vector machine models to predict short-term mortality. In order to compare the performance of each model, we calculated the AUC score for the logistic regression model using MELD score only based on the same subset of patients for each multi-omics dataset.

Selected Variables from Multi-Omics Datasets
To further improve the performance of mortality prediction for patients with alcoholic hepatitis, we collected multi-omics data, including fecal bacterial microbiome, fecal fungal mycobiome, fecal virome, serum metabolome and lipidome ( Figure 2). Due to limited sample availability, multi-omics data were collected from only a subset of the patient cohort, and multiple imputation was applied to the multi-omics data to preserve all samples having missing values. After multiple imputation, we used random forest to select variables with the top 11 average feature importance in each multi-omics data, so the number of variables included in the models using the multi-omics data and using the clinical data was the same. The 11 selected variables are listed in Table 2. When implementing the random forest, we forced the component variables used to calculate the MELD score (creatinine, bilirubin, international normalized ratio, and sodium) to be selected for splitting at each node in the trees. Then, using these 11 selected features for each multi-omics dataset, we built gradient boosting, logistic regression, random forest or support vector machine models to predict short-term mortality. In order to compare the performance of each model, we calculated the AUC score for the logistic regression model using MELD score only based on the same subset of patients for each multi-omics dataset.

Fecal Bacteria and MetaCyc Pathways
Bacteria and Metacyc pathways were available for 73 patients at 30 days and for 53 patients at 90 days. For these patients, AUCs were 0.79 and 0.63 when predicting 30-day and 90-day mortality using the logistic regression model with MELD score only, respec-

Fecal Bacteria and MetaCyc Pathways
Bacteria and Metacyc pathways were available for 73 patients at 30 days and for 53 patients at 90 days. For these patients, AUCs were 0.79 and 0.63 when predicting 30-day and 90-day mortality using the logistic regression model with MELD score only, respectively ( Figure 3A). Among the four models, bacteria, metabolic (MetaCyc) pathways and clinical data achieved the highest AUC of 0.87 for 30-day mortality using the gradient boosting model, and 0.69 for 90-day mortality using the support vector machine model ( Figure 3B,C), both of which were higher than AUC based on MELD score only (Table 3).

Fecal Fungal Datasets
Fecal fungi were available for 54 patients at 30-day and 39 patients at 90-day. For these patients, AUC was 0.72 and 0.25 for the logistic regression model when predicting 30-day and 90-day mortality using MELD score only, respectively ( Figure 4A). Applying the support vector machine model to fecal fungi and clinical data, the highest AUC of 0.86 was achieved for 30-day mortality. Meanwhile, the highest AUC of 0.87 was achieved for 90-day mortality by the gradient boosting model (Figure 4B,C). All four models based on fecal fungi and clinical laboratory data performed better when predicting 90-day mortality than the logistic regression model based on MELD score (Table 3).

Fecal Fungal Datasets
Fecal fungi were available for 54 patients at 30-day and 39 patients at 90-day. For these patients, AUC was 0.72 and 0.25 for the logistic regression model when predicting 30-day and 90-day mortality using MELD score only, respectively ( Figure 4A). Applying the support vector machine model to fecal fungi and clinical data, the highest AUC of 0.86 was achieved for 30-day mortality. Meanwhile, the highest AUC of 0.87 was achieved for 90-day mortality by the gradient boosting model ( Figure 4B,C). All four models based on fecal fungi and clinical laboratory data performed better when predicting 90-day mortality than the logistic regression model based on MELD score (Table 3).

Fecal Viral Datasets
Fecal virome analysis was available for 76 patients at 30-day and 56 patients at 90day. For these patients, AUC was 0.72 and 0.67 for the logistic regression model when predicting 30-day and 90-day mortality using MELD score only, respectively ( Figure 5A). Among the four models, viral and clinical laboratory data achieved the highest AUC of 0.87 for 30-day mortality using the logistic regression model, and achieved the highest AUC of 0.63 for 90-day mortality using the support vector machine model ( Figure 5B,C). The logistic regression, support vector machine, and random forest models based on the viral and clinical laboratory data performed better when predicting 30-day mortality than the logistic regression model based on MELD score (Table 3).

Fecal Viral Datasets
Fecal virome analysis was available for 76 patients at 30-day and 56 patients at 90-day. For these patients, AUC was 0.72 and 0.67 for the logistic regression model when predicting 30-day and 90-day mortality using MELD score only, respectively ( Figure 5A). Among the four models, viral and clinical laboratory data achieved the highest AUC of 0.87 for 30-day mortality using the logistic regression model, and achieved the highest AUC of 0.63 for 90-day mortality using the support vector machine model ( Figure 5B,C). The logistic regression, support vector machine, and random forest models based on the viral and clinical laboratory data performed better when predicting 30-day mortality than the logistic regression model based on MELD score (Table 3).

Serum Metabolites and Lipids
Serum metabolites and lipids were available for 118 patients at 30 days and 90 patients at 90 days. For these patients, AUC was 0.77 and 0.83 for the logistic regression model when predicting 30-day and 90-day mortality using the MELD score only, respectively ( Figure 6A). Among the four models, metabolites and lipids achieved the highest AUC of 0.74 for 30-day mortality using the support vector machine model, and achieved the highest AUC of 0.78 for 90-day mortality using the random forest model ( Figure 6B,C).

Serum Metabolites and Lipids
Serum metabolites and lipids were available for 118 patients at 30 days and 90 patients at 90 days. For these patients, AUC was 0.77 and 0.83 for the logistic regression model when predicting 30-day and 90-day mortality using the MELD score only, respectively ( Figure 6A). Among the four models, metabolites and lipids achieved the highest AUC of 0.74 for 30-day mortality using the support vector machine model, and achieved the highest AUC of 0.78 for 90-day mortality using the random forest model ( Figure 6B,C).

ASCA, Zonulin and LBP
In addition to multi-omics datasets, we also evaluated routine laboratory parameters together with serum biomarkers of gut barrier dysfunction, ASCA, zonulin and LBP. These data were available for 138 patients at 30 days and 114 patients at 90 days. For these patients, AUC was 0.77 and 0.79 for the logistic regression model when predicting 30-day and 90-day mortality using the MELD score only, respectively ( Figure 7A). Among the four models, the highest AUC was 0.76 for 30-day mortality using the random forest model, and 0.71 for 90-day mortality using the random forest and gradient boosting models ( Figure 7B,C).

ASCA, Zonulin and LBP
In addition to multi-omics datasets, we also evaluated routine laboratory parameters together with serum biomarkers of gut barrier dysfunction, ASCA, zonulin and LBP. These data were available for 138 patients at 30 days and 114 patients at 90 days. For these patients, AUC was 0.77 and 0.79 for the logistic regression model when predicting 30-day and 90-day mortality using the MELD score only, respectively ( Figure 7A). Among the four models, the highest AUC was 0.76 for 30-day mortality using the random forest model, and 0.71 for 90-day mortality using the random forest and gradient boosting models ( Figure 7B,C). A summary of AUC scores for each dataset is shown in Table 3. Multi-omics or serum biomarkers combined with routine clinical laboratory parameters improved the performance for the prediction of 30-and 90-day mortality in patients with alcoholic hepatitis, with the highest AUC achieved being 0.87 (gradient boosting using bacteria, Metacyc pathways and clinical data, as well as logistic regression using viral and clinical data) and 0.87 (gradient boosting model using fungi and clinical data), respectively.

Discussion
The identification of patients with alcoholic hepatitis at greatest risk of death is necessary for treatment stratification towards early liver transplantation, prednisolone therapy, clinical trial or supportive care. Invasive testing with liver biopsy can lead to increased morbidity, and is currently recommended to confirm the diagnosis of alcoholic hepatitis only in the presence of potential confounding factors or if treatment with immu- A summary of AUC scores for each dataset is shown in Table 3. Multi-omics or serum biomarkers combined with routine clinical laboratory parameters improved the performance for the prediction of 30-and 90-day mortality in patients with alcoholic hepatitis, with the highest AUC achieved being 0.87 (gradient boosting using bacteria, Metacyc pathways and clinical data, as well as logistic regression using viral and clinical data) and 0.87 (gradient boosting model using fungi and clinical data), respectively.

Discussion
The identification of patients with alcoholic hepatitis at greatest risk of death is necessary for treatment stratification towards early liver transplantation, prednisolone therapy, clinical trial or supportive care. Invasive testing with liver biopsy can lead to increased morbidity, and is currently recommended to confirm the diagnosis of alcoholic hepatitis only in the presence of potential confounding factors or if treatment with immunosuppressive therapy is considered [17]. Therefore, non-invasive scoring systems are important, and various prognostic clinical models have been developed and applied to patients to assess the severity of alcoholic hepatitis. The AUCs for the discriminant function (DF) were significantly lower than for MELD, the age, serum bilirubin, international normalized ratio and serum creatinine (ABIC) score, and Glasgow alcoholic hepatitis score (GAHS) for both 28-and 90-day outcomes: 90-day values were 0.670, 0.704, 0.726 and 0.713, respectively [18].
The implementation of machine learning models has rapidly increased in the biomedical field including liver diseases in recent years [15,16,19,20]. To predict cirrhosis in patients with non-alcoholic fatty liver disease (NAFLD), AUC achieved 0.91 when using random forest machine learning algorithm to integrate shotgun metagenomic and untargeted metabolomic profiles [21]. However, a promising model for mortality prediction has not been applied to patients with alcoholic hepatitis. In the present study, we developed four models to predict short-term mortality, and some of them showed better performance than the currently used MELD score. Especially, the gradient boosting analysis of bacteria and metabolic pathways datasets achieved the highest AUC of 0.87 for 30-day mortality prediction. Among the selected bacteria and metabolic pathways used for the 30-day mortality prediction, 6 pathways were related to purine nucleoside biosynthesis. Nucleotide biosynthesis has been reported to be critical for the growth of bacteria in human blood [22]. The causal relationship between microbial nucleoside biosynthesis and mortality requires further investigation.
In addition to multi-omics study design and comparison of four models, the multicenter study design is another strength of this study, which recruited patients from diverse geographical origins. One limitation of this study was the relatively small sample size, given the complexity of the machine learning pipelines used. Despite using a test and validation cohort in our study, external validation with a larger number of patients is required to confirm our prediction model. Another limitation is that omics are not yet readily usable in routine clinical practice unless these methods become less expensive and more standardized.
In summary, this is the first comprehensive study to predict short-term mortality using different machine learning algorithms with multi-omics data covering not only serum metabolites, serum lipids and fecal bacteria, but also fecal fungi and viruses, which were not well studied in alcoholic hepatitis. This model is helpful for physicians to identify patients with greatest risk and make better clinical decisions for patients with alcoholic hepatitis.

Patients
A total of 210 patients diagnosed with alcoholic hepatitis were recruited from 10 institutions in the United States, Canada and Europe. The clinical picture was consistent with alcoholic hepatitis in all patients. The patient cohort has been described previously [6,8,23]. The inclusion criteria for alcoholic hepatitis were: 1. active alcohol use (>50 g/day for men and >40 g/day for women) in the last 3 months; 2. aspartate aminotransferase (AST) >alanine aminotransferase (ALT) and total bilirubin >3 mg/dL in the past 3 months; 3. liver biopsy and/or clinical picture consistent with alcoholic hepatitis. The exclusion criteria were: 1. autoimmune liver disease (ANA > 1/320); 2. chronic viral hepatitis; 3. hepatocellular carcinoma; 4. complete portal vein thrombosis; 5. extrahepatic terminal disease; 6. pregnancy; 7. lack of signed informed consent. Liver biopsies were performed only if indicated as part of routine clinical care for the purpose of alcoholic hepatitis diagnosis. For patients who underwent liver biopsy, the liver histology was in line with the diagnosis of alcoholic hepatitis. The protocol was approved by the Ethics Committee of each participating center. Written informed consent was obtained from each subject. The MELD score was calculated for all alcoholic hepatitis patients whose required variables were available. Eleven clinical parameters were evaluated in the random forest model to predict the 30-day and 90-day mortality, including age, creatinine, bilirubin, albumin, international normalized ratio, alanine transaminase, alkaline phosphatase, platelet count, white blood cell count, aspartate transaminase, and sodium.

Shotgun Metagenomics
DNA was extracted from stool samples collected from 73 patients. DNA extraction and library preparation were performed as described previously [24]. Shot-gun metagenomics sequencing was performed on an Illumina HiSeq 4000, generating 150bp paired-end reads. KneadData version 0.7.2 was used for quality control. Metagenomic Phylogenetic Analysis 2 (MetaPhlAn2) version 2.7.7 was used for the profiling of the composition of the microbial community [25]. The HMP Unified Metabolic Analysis Network 2 (HUMAnN2) version 0.11.1 was used for the profiling of microbial pathways [26]. The MetaCyc database was used for microbial pathway analysis [27]. Each of the HUMAnN2 abundance outputs was normalized into relative abundance (the counts for each sample sum to 100).

Mycobiome Analysis
Fecal mycobiomes were evaluated using internal transcribed spacer (ITS) sequencing targeting fungal ITS1 region from 54 patients. Fungal ITS sequencing was performed using Illumina MiSeq V2 kit, 300 cycles using primers. Primers, PCR conditions and data processing were described in our previous study [8].

Viral Metagenomics
Virus-like particles were isolated from fecal samples collected from 76 patients using differential filtration techniques followed by metagenomic sequencing. Viromes were prepared using the NetoVIR protocol, with minor modifications [28]. Briefly, resuspended fecal samples were filtered using a 0.8 µm (PES) filter (Sartorius). The remaining supernatant was subjected to lysis followed by viral DNA and RNA extraction. Library preparation was performed as described previously [9]. Clumpify and Kneaddata were used for the quality control of raw sequence reads. The PathSeq pipeline was used for the read alignment and taxonomy assignment [29].

Untargeted Metabolomics and Lipidomics
Serum metabolome and lipidome from 132 patients were analyzed by multi-platforms, including gas chromatography-time of flight mass spectrometry (GC-TOF MS), hydrophilic interaction liquid chromatography (HILIC) with quadrupole orbital ion trap high field mass spectrometry (Q-Exactive HF MS), and CSH-Q-Exactive HF MS. Sample extraction, data acquisition and data processing were performed as described in our previous study [30]. Briefly, ChromaTOF version 4.50 was used for baseline subtraction, deconvolution and peak detection for GC-MS raw data. Binbase version 5.0.3 was used for metabolite annotation and reporting [31]. For LC-MS raw data, MS-DIAL was used for peak picking, alignment, deconvolution and identification [32]. The level of confidence in the identification was level 3 [33]. MS-FLO was used for the identification of ion adducts, duplicate peaks and isotopic features [34]. For both the HILIC and lipidomics datasets, retention time-m/z libraries and MS/MS spectra databases were used for compound identification, which were uploaded to MassBank of North America.

Enzyme Linked Immunosorbent Assay (ELISA)
Anti-saccharomyces cerevisiae antibody (ASCA)-IgG, zonulin and lipopolysaccharide binding protein (LBP) levels were measured in the serum samples collected from 132 patients using different ELISA kits, as described previously [8].

Machine Learning Models
The predictive power with eleven clinical parameters, multi-omics datasets, and three markers for intestinal permeability was evaluated for the short-term mortality prediction in patients with alcoholic hepatitis. Logistic regression (LR) [35], support vector machine (SVM) [36], random forest (RF) [37], and gradient boosting (GB) [38] were built using functions from scikit-learn in Python. Before any data preprocessing, we divided the datasets into 5 folds using stratified 5-fold cross-validation (CV). Multivariate imputation by chained equations (MICE) was used to impute the missing values in each feature [39]. To avoid data leakage from the training set to the test set, MICE was only applied on the training set inside each CV iteration, and the test set was imputed by the fitted model of MICE on the training set. In order to deal with the class imbalance and promote the performance of the models, after imputation, the synthetic minority oversampling technique (SMOTE) was used to oversample the minor class in the training set only to obtain balanced data [40]. When the multi-omics datasets were used for the short-term mortality prediction, random forest from the ranger Package in R was additionally applied to the training set before performing SMOTE to select 11 variables based on the average feature importance over 5 CV iterations mentioned above [41]. The component variables used to calculate the MELD score were always selected for splitting at each node in trees when building a ranger random forest. For the main models (LR, SVM, RF, or GB), the default setting was used when building LR and SVM, and the number of trees and the maximum tree depth were chosen for tuning when building RF and GB. In order to choose the best set of hyperparameters in RF and GB, grid search with stratified 4-fold CV (inner CV) was performed in each CV iteration using the original training set (the one before doing MICE). In RF, the number of trees chosen was from 100, 200, 300, 400, and the maximum tree depth chosen was from 5, 10, 15, 20. In GB, the number of trees was chosen from 100, 200, 300, 400, and the maximum tree depth chosen was from 1, 2, 3, 4. In each inner CV iteration, MICE, feature selection, and SMOTE were performed again on the training subset to avoid data leakage from the training subset to the validation set, and the imputation model from this second MICE was used to impute the validation set. A set of hyperparameters with the highest F1 score evaluated on the validation set was chosen (40), then we fit the main model with this best set of hyperparameters using the training set (the one after doing the SMOTE in Supplementary Figure S1A) and assessed the model on the test set for each outer CV iteration. The final model performance was the average of model performance for each outer CV iteration. ROC was used to evaluate performances for the short-term mortality prediction in patients with alcoholic hepatitis. A more comprehensive description on the details of the procedures is shown in Supplementary Figure S1.

Institutional Review Board Statement:
The protocol used in this study was approved by the Ethics Committee of each participating center.
Informed Consent Statement: Written informed consent was obtained from each subject. Data Availability Statement: Code and machine learning models are available on Github (https: //github.com/morris16206/Alcoholic-hepatitis (accessed on 12 August 2021)). Raw 16S rRNA sequencing reads can be found in the National Center for Biotechnology Information (NCBI) SRA associated with Bioproject PRJNA525701. Fungal sequencing data can be found under BioProject PRJNA517994. Shotgun Metagenomics sequence data were deposited in the European Nucleotide Archive under accession numbers ERP106878. Metabolomics and lipidomics datasets are provided in this manuscript as supplementary files.