A LIME-Based Explainable Machine Learning Model for Predicting the Severity Level of COVID-19 Diagnosed Patients

: The fast and seemingly uncontrollable spread of the novel coronavirus disease (COVID-19) poses great challenges to an already overloaded health system worldwide. It thus exempliﬁes an urgent need for fast and effective triage. Such triage can help in the implementation of the necessary measures to prevent patient deterioration and conserve strained hospital resources. We examine two types of machine learning models, a multilayer perceptron artiﬁcial neural networks and decision trees, to predict the severity level of illness for patients diagnosed with COVID-19, based on their medical history and laboratory test results. In addition, we combine the machine learning models with a LIME-based explainable model to provide explainability of the model prediction. Our experimental results indicate that the model can achieve up to 80% prediction accuracy for the dataset we used. Finally, we integrate the explainable machine learning models into a mobile application to enable the usage of the proposed models by medical staff worldwide.


Introduction
On 30 January 2020, the World Health Organization (WHO) declared the COVID-19 pandemic to be a public health emergency of international concern. As of September 2021, WHO had reported more than 232 million confirmed cases of COVID-19 worldwide and more than 4.7 million deaths (https://covid19.who.int, accessed on 11 October 2021). COVID-19 is characterized as a severe acute respiratory syndrome that is caused by the SARS-CoV-2 virus. While the epidemiological and clinical characteristics of COVID-19 are increasingly understood, risk factors underlying the transition from mild to severe disease in patients remain poorly understood [1].
The severity of COVID-19 disease varies considerably among patients, ranging from asymptomatic infection to mild upper-respiratory tract illness, to severe viral pneumonia with acute respiratory distress, respiratory failure, and thromboembolic events that can lead to death [1]. Therefore, evaluating a patient's likely severity of illness remains an important concern for both clinicians and policy makers. Such an understanding would assist practitioners in performing rapid, early, and effective triage, help conserve strained hospital resources, and deepen our understanding of the disease [2]. Evidence obtained before the introduction of the COVID-19 vaccine suggests that 6-10% of infected patients are likely to become critically ill, most of whom will require mechanical ventilation and intensive care [1,3]. However, currently there are few prognostic markers to forecast whether a patient with COVID-19 is likely to deteriorate to a critical condition and require intensive care. We thus suggest that data-driven risk evaluation based on machine learning approaches, such as artificial neural networks (ANNs) and decision trees, can and should play an important role in the prediction of morbidity and illness severity.
Machine learning [4][5][6][7][8][9] techniques are rapidly being employed for diverse medical applications [10], such as predicting diabetes [11] and liver diseases [12], detecting and predicting cancer [13], personalizing treatments [14], drug discovery [15], radiology [16], and many more. Such techniques employ statistical methods to identify complex relationships among patient medical attributes and outcomes, using large datasets. Two major medical domains that currently leverage machine learning usage are diagnosis and outcome prediction. In particular, machine learning can be a highly valuable tool for identifying individuals at high risk of health deterioration.
Clinical reports of COVID-19 suggest that age, sex, and underlying comorbidities, such as renal diseases, hypertension, cardiovascular disease, and diabetes, can adversely affect patient outcomes [1]. To date, most machine learning applications for the COVID-19 pandemic have focused on patient diagnosis [17] and projecting the number of infected individuals and mortality rates [18]. Very few studies [19] have leveraged machine learning to systematically explore risk factors to assess the likely severity of disease in patients diagnosed with COVID-19 and to predict patient outcomes based on early clinical data.
In this study, we introduce two types of machine learning models, a multilayer perceptron (MLP) artificial neural networks (ANN) [4] and Random Forest (RF) decision trees [20], to predict the level of severity of illness in patients diagnosed with COVID-19. The level of severity can be described as one of multiple clinical stages of a patient's hospital stay, such as admission to intensive care, the need for invasive mechanical ventilation, and in-hospital mortality. Our machine learning models were trained and validated using an open dataset provided by the Mexican Federal Health Secretary through the General Director of Epidemiology, see Supplementary Materials [21]. The database contained data collected regarding patients' initial presentation at the emergency department (ED), from May 2020 until October 2020, and comprises more than 50,000 cases.
Despite their growing use in the field of medicine, machine learning models remain extremely difficult to interpret, as they often lack explainability. In this study, we introduce a combined model that integrates a LIME-based explainability model with machine learning model [22]. The combined model can explain the predictions of the MLP classifiers in an interpretable and accurate manner. More importantly, our explainability model may help physicians to identify the factors that most influence a patient's severity of illness. These properties would help medical staff to take preventive measures (when possible) to reduce a patient's risk of deterioration.
We summarize the contribution of this study as follows: 1.
We introduce MLP and decision tree machine learning models to predict the level of severity of illness in patients diagnosed with COVID-19. Our experimental results indicate that our models can achieve up to 80% prediction accuracy.

2.
Applying the proposed machine learning models can assist medical staff in performing fast, early, and effective triage, thereby taking the necessary measures to prevent patient deterioration and conserve strained medical resources.

3.
We integrate a LIME-based explainability model with the machine learning model and demonstrate how such integrated model can be used to provide further insights into an individual patient's medical condition and can identify the factors that most contribute to the patient's illness severity.

4.
Finally, integration into a mobile application makes the proposed models accessible and easy-to-use by medical staff everywhere.
The rest of this paper is organized as follows: Section 2 reviews prior related works, Section 3 describes the dataset of patients diagnosed with COVID-19 that we used for our study, Section 3 presents our machine learning model, Section 4 introduces the explainability method that has been combined with our MLP model, Section 5 presents our mobile application design, and Section 6 summarizes our conclusions and suggestions for future studies.

Prior Works
Various studies have used machine learning techniques to study the COVID-19 pandemic. Yet all studies faced a common challenge: the lack of a high quality, COVID-19related medical dataset, which could be easily processed by machine learning models. Related prior works can be divided into two main groups: pandemic behavior forecasting and diagnosis application.

COVID-19 Pandemic Behavior Forecasting
Reference [18] suggested a hybrid machine learning approach of adaptive networkbased fuzzy inference system to predict time series of infected individuals and mortality rates. Other researchers proposed deep learning models to predict the exponential behavior and reach of the COVID-19 pandemic worldwide by utilizing the real-time information from the Johns Hopkins dashboard [23]. Reference [24] suggested an autoregressive, integrated moving average model to predict the spread of COVID-19 for the next two days. A time series method to analyze COVID-19 outbreak trends by performing statistical analysis was introduced by Reference [25]. Researchers also presented linear regression, multilayer perceptron, and vector autoregression methods to predict the pace of COVID-19 spread in India [26]. Each of these studies attempts to predict the behavior of COVID-19 at the macro level in order to assist governments and public health officials in managing pandemic strategies.

Diagnostic Applications
Several studies [27][28][29][30] monitored chest X-ray and CT image changes prior to COVID-19 symptoms and found that such techniques play an important role in early diagnosis and treatment of COVID-19. Reference [31] suggested a machine learning-based sensor for facial thermal scans to detect COVID-19 patients. Reference [32] developed a machine learning, speech processing method to detect COVID-19 patients from coughs. Machine learning and AI-based smartphone sensors were used by Reference [33] to detect COVID-19 symptoms. Reference [34] attempted to predict patients' risk of a COVID-19 diagnosis using a machine learning model trained on routinely collected data from emergency care admissions. The work most related to our study was presented by Reference [35], which suggested a machine learning model to predict patients at risk for developing acute respiratory distress syndrome (ARDS), a severe outcome in COVID-19. Their models achieved 70-80% prediction accuracy. It should be noted that their model focused on only one level of severity and did not include any explainability methods.

Dataset of Patients Diagnosed with COVID-19
Our study is based on an open dataset provided by the Mexican Federal Health Secretary through the General Director of Epidemiology. The dataset comprises anonymous records for both patients diagnosed with COVID-19 as well as non-COVID-19 patients, which were collected starting from May 2020 up until 28 October 2020. The dataset has been publicly released online for downloading [21]. Of the dataset records, we extracted only those patients diagnosed with COVID-19; all non-COVID-19 records were excluded. The dataset was also preprocessed to remove any corrupted records or records with missing fields. The dataset (following postprocessing) comprised more than 50,000 patients diagnosed with COVID-19.
Each record of a patient diagnosed with COVID-19 consists of two parts: 1. Patient's medical history and symptoms upon arrival at hospital, described in Table 1.
All fields in Table 1 are used as input features for our machine learning models.

2.
Patient's clinical outcome, summarized in Table 2. All fields in Table 2 are combined to create labels that represented the severity of COVID-19 morbidity. The dataset was partitioned such that 70% of the records were used for the training process and 30% for validation. Every record in the dataset represents one anonymous patient and includes the fields shown in Tables 1 and 2.

Machine Learning Model
In our study we examine two types of machine learning models, MLP [4] and RF decision trees [20], which are presented in detail in this section. Our machine learning models are trained using the COVID-19 dataset (described in Section 3).

Machine Learning Models
As part of our analysis, we examined various fully connected MLP models and hyperparameters and found that the two MLP models, illustrated in Figure 1, possessed the highest prediction accuracy for the examined dataset. Each model is a fully connected network which consists of three layers (beside the input layer): (1) the first layer has 24 neurons, each connected to the input features described in Table 1, (2) the second layer has 16 neurons, and (3) an output layer. The output layer in the first model (MLP model A) has a single neuron which provides a binary prediction (low or high) for a COVID-19 patient's level of disease severity. For the second mode (MLP model B), the output layer has three neurons that predict severity in three levels (low, medium, or high). Each neuron in our MLP models uses the sigmoid activation function.
The mapping of severity levels to medical conditions in each model is summarized in Table 3, based on the dataset fields shown in Table 2 A) has a single neuron which provides a binary prediction (low o patient's level of disease severity. For the second mode (MLP mo has three neurons that predict severity in three levels (low, medium in our MLP models uses the sigmoid activation function. The mapping of severity levels to medical conditions in each in Table 3, based on the dataset fields shown in Table 2. For MLP m level includes patient hospitalization or discharging, while MLP pitalization only. The high severity level includes intubation and while MLP model A also includes ICU admission. The medium se only in MLP model B and includes patient's hospitalization or IC biasing the models' training process, an equal number of records Our experimental analysis examined a range of hyperparame achieved the highest prediction accuracy measured on the validati  Our experimental analysis examined a range of hyperparameters and used those that achieved the highest prediction accuracy measured on the validation dataset. Throughout the training process, the learning rate was 0.002, the batch size was 20, and the number of used batches per batch was 200 for Model A and 1000 for Model B. We used the Adam gradient-based optimizer [36] for the model back-propagation training. The list of all MLP model hyperparameters is summarized in Table 4.

Random Forest Model
Decision trees serve as predictive models that use a set of binary rules to calculate a target value. There are two types of decision trees: classification and regression trees. Classification trees include categorical target variables that are divided into categories represented by the tree leaves. Tree branches represent conjunctions of features that lead to the class labels. Unlike classification trees, in regression trees, the target variable has continuous values. In our study, we use Random Forest (RF) classification trees [20]. RF is an ensemble of multiple classification trees, where the prediction is produced by a voting process of all the decision trees.
Through our experimental analysis we examined various RF models and a broad range of hyperparameters and values, then used those that achieved the highest prediction accuracy measured on the validation dataset. Similar to the MLP models, we suggest two types of RF trees. The first model (RF model A) has two output classes which correspond to the severity levels of MLP model A, while the second RF mode employs three classes which correspond to MLP model B outputs. The RD models have been trained and validated on the same datasets used for the MLP models. The hyperparameters values that produce the optimal prediction accuracy for the used dataset are summarized in Table 5. Our RF models used 50 trees in the forest. Our experiments indicated that a greater number of trees did not achieve a better model performance. The model used bootstrap samples to build the trees. The minimum number of samples required to be at a leaf node was 3. The minimum number of samples required to split an internal node was 10. The number of features considered when looking for the best split was the square root of the number of model features. The maximum depth of a tree is 6. There was no limit on the number of leaf nodes.  Figure 2a,b illustrate the prediction accuracy for each of the MLP models through the training process. In the validation phase, the overall prediction accuracy was measured using the validation dataset. MLP model A achieved 80% accuracy, while MLP model B achieved approximately 78% accuracy. Figure 3a,b depict the confusion matrixes of the two MLP models. It can be observed that MLP model A achieved 92% and 72% prediction accuracies for patients diagnosed with COVID-19 with a low and a high level of severity, respectively. Model B achieved a prediction accuracy of 90%, 74%, and 77% for patients diagnosed with COVID-19 with a low, medium, and high level of severity, respectively.

Experimental Results
The RF models produced 80.0% prediction accuracy for RF model A and 65% for RRF model B. Figure 3c,d present the confusion matrixes of the RF models. RF model A achieved 75% and 85% prediction accuracies for patients with a low and a high level of illness severity, respectively. RF Model B achieved a prediction accuracy of 92%, 33%, and 71% for patients diagnosed with COVID-19 with a low, medium, and high level of severity, respectively. The RF models produced 80.0% prediction accuracy for RF model A and 65% for RRF model B. Figure 3c,d present the confusion matrixes of the RF models. RF model A achieved 75% and 85% prediction accuracies for patients with a low and a high level of illness severity, respectively. RF Model B achieved a prediction accuracy of 92%, 33%, and 71% for patients diagnosed with COVID-19 with a low, medium, and high level of severity, respectively.   The RF models produced 80.0% prediction accuracy for RF model A and 65% fo model B. Figure 3c,d present the confusion matrixes of the RF models. RF mo achieved 75% and 85% prediction accuracies for patients with a low and a high le illness severity, respectively. RF Model B achieved a prediction accuracy of 92%, 33% 71% for patients diagnosed with COVID-19 with a low, medium, and high level of ity, respectively.  We compare the accuracy of the machine learning model to a binary regression model, applied to the same training and validation datasets. Figure 3e illustrates the confusion matrix of the binary regression model. The overall prediction accuracy of the binary regression model is 76.3%, 3.7% lower than the machine learning model. For low severity illness, the binary regression model accuracy is 79.3%, whereas the machine learning model achieves 92% accuracy. For high severity illness, the binary regression model accuracy is 2.2% lower than the machine learning model accuracy. The prediction accuracy of all the models examined is summarized in Table 6. The prediction accuracy results presented in Table 5 do not include model cross-validation, thereby it is highly encouraged for applications to perform cross-validation before deployment in the field. It should be noted that the prediction accuracy achieved by our proposed machine learning models is highly dependent on the quality of the dataset. Although the dataset used is limited, the results that have been achieved are superior to similar works on this dataset (https://www.kaggle.com/tanmoyx/covid19-patient-precondition-dataset? select=covid.csv, accessed on 28 October 2021).

A LIME-Based Explainability Model
Recent studies have introduced new explainability methods [22] that can explain the predictions of machine learning classifiers in an interpretable and accurate manner. As part of this study, we extended our machine learning model by using the Local Interpretable Model-agnostic Explanations (LIME) model [37]. The LIME model can explain inferences produced by machine learning models by performing a local approximation of the inference point. The LIME algorithm builds a linear regression in proximity to a specific inference point (that needs to be explained). Features with high positive weight in the linear regression approximation support the prediction decision, while features with negative weight oppose the decision. The original LIME model incurs stability issue, i.e., when the model is employed recurrently, under the same conditions, it may return different explanations. To overcome the stability issue, we use an enhanced type of the LIME model with statistical stability indices (available at https://github.com/giorgiovisani/LIME_stability 28 October 2021) which has been introduced by Reference [38].
We demonstrate the usage of the LIME model on the MLP model A by choosing a rather complex case for analysis. Figure 4 exemplifies an explanation of a patient diagnosed with COVID-19. The patient's medical history and symptoms on arrival at the hospital are summarized in Figure 4a. The patient is over 70 years old with a medical history including pneumonia, diabetes, chronical renal failure, and cardiovascular disease. Our machine learning model (MLP model A) predicts the patient's severity of illness to be high. The LIME explanation is illustrated in Figure 4b. The blue bars illustrate the medical background and symptoms that have significant weight in supporting the prediction, while the green bars illustrate factors that comprise evidence against it. The explanation indicates that at the time of the prediction, the patient's main symptoms and medical history that most contribute to the prediction are pneumonia, age, diabetes, chronical renal failure, cardiovascular disease, asthma, other diseases, and exposure to other patients diagnosed with  sion in explaining complex morbidity. While the LIME model provides explainability per individual patients, the binary regression model provides a global statistical explainability.
We summarize this discussion by indicating that such an explainability capability introduces new opportunities to help physicians identify the factors that most influence a patient's severity of illness and deterioration.

Application Design
As part of this study, we integrated the explainable machine learning models into a mobile application. The mobile application is accessible (https://github.com/freddygabbay/covid19explainableML, accessed on 28 October 2021) for medical and healthcare staff to assess the severity of illness of patients diagnosed with COVID-19. The application development environment is based on the Kivy open-source platform [39] and can run on Android operating systems. The machine learning model was coded in Python using the Keras framework [40]. The application can be run offline, i.e., a network connection is not necessary to run the model. The application's user-interface (UI) is shown in Figure 5 and includes the home page, patient background details form (including all the background information summarized in Table 1), the model prediction result, and an explanation of the prediction. We compare the explanation provided by the LIME model with our binary regression model in Section 4. The independent variables in the binary regression equation are presented in Figure 4c with their corresponding weights (B) and statistical significance (Sig.). Figure 4c indicates that sex, pneumonia, age, diabetes, asthma, immunity system suppression, and obesity are significant contributors to a high severity of illness. One may notice that factors such as chronical renal failure, cardiovascular disease, and exposure to other COVID-19 patients have not been considered as major contributors to severe illness by the binary regression model. This is in contrast to our explainable machine learning model, which has identified these factors as contributing to a high severity level of illness in the described case analysis. This example demonstrates the limitation of binary regression in explaining complex morbidity. While the LIME model provides explainability per individual patients, the binary regression model provides a global statistical explainability.
We summarize this discussion by indicating that such an explainability capability introduces new opportunities to help physicians identify the factors that most influence a patient's severity of illness and deterioration.

Application Design
As part of this study, we integrated the explainable machine learning models into a mobile application. The mobile application is accessible (https://github.com/freddygabbay/ covid19explainableML, accessed on 28 October 2021) for medical and healthcare staff to assess the severity of illness of patients diagnosed with COVID-19. The application development environment is based on the Kivy open-source platform [39] and can run on Android operating systems. The machine learning model was coded in Python using the Keras framework [40]. The application can be run offline, i.e., a network connection is not necessary to run the model. The application's user-interface (UI) is shown in Figure 5 and includes the home page, patient background details form (including all the background information summarized in Table 1), the model prediction result, and an explanation of the prediction.

Conclusions
This study introduced machine learning models for predicting the severity of illness in patients diagnosed with COVID-19. The prediction is made based on a patient's medical history and symptoms on arrival at the hospital. The MLP machine learning models achieved 78-80% prediction accuracy while the models produced 65-80% prediction accuracy. Our machine learning models were combined with the LIME explainability model and then integrated into a mobile application, which can support medical staff in performing effective triage based on medical tests to reduce a patient's risk of medical deterioration. In addition, our explainability model can offer further insight regarding an individual patient's medical condition and help identify the factors critically affecting the pa-

Conclusions
This study introduced machine learning models for predicting the severity of illness in patients diagnosed with COVID-19. The prediction is made based on a patient's medical history and symptoms on arrival at the hospital. The MLP machine learning models achieved 78-80% prediction accuracy while the models produced 65-80% prediction accuracy. Our machine learning models were combined with the LIME explainability model and then integrated into a mobile application, which can support medical staff in performing effective triage based on medical tests to reduce a patient's risk of medical deterioration.