Prediction of Severity of COVID-19-Infected Patients Using Machine Learning Techniques

: Precisely assessing the severity of persons with COVID-19 at an early stage is an effective way to increase the survival rate of patients. Based on the initial screening, to identify and triage the people at highest risk of complications that can result in mortality risk in patients is a challenging problem, especially in developing nations around the world. This problem is further aggravated due to the shortage of specialists. Using machine learning (ML) techniques to predict the severity of persons with COVID-19 in the initial screening process can be an effective method which would enable patients to be sorted and treated and accordingly receive appropriate clinical management with optimum use of medical facilities. In this study, we applied and evaluated the effectiveness of three types of Artiﬁcial Neural Network (ANN), Support Vector Machine and Random forest regression using a variety of learning methods, for early prediction of severity using patient history and laboratory ﬁndings. The performance of different machine learning techniques to predict severity with clinical features shows that it can be successfully applied to precisely and quickly assess the severity of the patient and the risk of death by using patient history and laboratory ﬁndings that can be an effective method for patients to be triaged and treated accordingly.


Introduction
The novel COVID-19 (severe acute respiratory syndrome coronavirus-2) disease was detected in Wuhan city, Hubei province and reported to the WHO Country Office in China on 31 December 2019; since then it has spread to countries and territories around the whole world [1]. COVID-19 is a ribonucleic acid (RNA) virus and a member of the Coronavirus family that resides in mammals and birds. In the past, the members of these viruses have also spilled over between species (mammals and birds to humans) [1,2]. The common symptoms of coronaviruses are the common cold, fever, and a sore throat, but vary significantly with the strain and varieties of virus, like a notable symptom in the COVID-19 being Tachypnea [3]. The mortality rates significantly vary with varieties of virus; SARS has the highest mortality rate however the common flu has the lowest rate. The severity scale of COVID-19 infection varies from mild to critical [4]. The Chinese Center for Disease Control and Prevention report on the severity of the disease features Mild (80.9%), Severe disease (13.8%), and Critical disease (4.7%) with a fatality rate of 2.3 percent [4,5]. Prediction of the severity of the patient and the risk of death by initial screening and testing results manually is a challenging problem. Hence, it has become an urgent yet challenging problem to identify patients with high risk of death from the infectious population using testing data with the assistance of artificial intelligence. The developed prognostic model based on artificial intelligence could instigate early treatment to critical patients and thus potentially minimize the mortality. The majority of health experts amicably endorse that 2 of 14 testing is the essential component to control the outbreak. The PCR test is the front-line test for COVID-19 because it directly detects the presence of the virus RNA. The Lung CT is also proposed by Ali et al. as an effective method of early detection of infection [6]. After test confirmation of infection, the patient needs to go for clinical testing to evaluate the risk factors for severe illness. In a study of patients who have confirmed symptoms of COVID-19, serious illness is found to occur in patients of all ages, but it occurs primarily in patients with advanced age or underlying medical disorders. Thus, the patient history of medical complexities, medical reports and clinical characteristics of the patients can be an effective technique to predict the severity of illness [7]. Researchers have made several efforts to establish correlations between clinical characteristics and medical complexities with severity of illness. There are hundreds of clinical characteristics associated with medical complexities that can influence the severity of illness. Therefore, advance features selection techniques are required to identify those features which contribute most to the severity of illness in which we are interested in, as having irrelevant clinical features increases the chances of error in prediction significantly. Plenty of feature selection methods are available in the literature. Relief-based algorithms (RBAs) developed by Kira and Rendell in 1992 [8], are one of the most prominent feature selection algorithms in the course of machine learning. Relief algorithms can efficiently identify the relevant feature from the set of clinical features which contribute to the severity of illness and have strong dependencies between them. Relief algorithms ranked the applied features or input variables according to their contribution in critical illness of the infected patient.
Precisely and early assessing the severity of the COVID-19-infected patients and the risk of death is very important for the categorizing patients, which would enable patients to be sorted and treated accordingly to receive appropriate clinical management with optimum use of medical facilities and prevent the occurrence of over handling or under handling of patient care, thus potentially reducing mortality. In particular, where an outbreak occurs and medical resources are relatively scarce, it is necessary to conduct an assessment of the severity and treatment, thereby optimizing the allocation of rescue resources, and to prevent the appearance of overtreatment or undertreatment. In clinical characteristics-based medical diagnosis, Artificial intelligence (AI) techniques such as Fuzzy logic, Genetic algorithms (GA), Decision Trees, Support Vector Machines (SVM), Artificial Neural Networks (ANNs), and Deep learning have gained popularity in the field of health care for detection, identification, and estimation of various medical problems [9][10][11].
In this paper, our study was to apply and evaluate machine learning techniques for precise and early prediction of the severity of the COVID-19-infected patient. The patient history and clinical findings were used to train different supervised machine learning techniques, and a portion of data sets were used to evaluate the effectiveness of the proposed methods. The rest of the paper is organized as follows, the materials and methods are discussed in Section 2. In Section 3, the proposed methodology is evaluated on real-life COVID datasets. Section 4 present the conclusions.

Results
The efficiency of the machine learning techniques were tested by a variety of machine learning techniques. The 80 samples of the data set were randomly partitioned into a training data set and a testing data set, the training data set consisting of 62.5% samples and remaining 37.5% samples used for testing. The normalized inputs were fed to each machine learning model. Various performance measures [12] like Accuracy, Sensitivity, Specificity, Precision, F_Score and G-mean error were used to evaluate the performance capability of each machine learning techniques using 10 cross validations.

Multilayer Perceptron
In the performance evaluation of the multi-layer perceptron (MLP), twenty neurons were used in the hidden layer. Performance evaluation of the Multilayer perceptron was done using Levenberg-Marquardt backpropagation ("trainlm"), Scaled conjugate gradient backpropagation ("trainscg") and Bayesian regularization backpropagation ("trainbr"). The average time taken in performance evaluation was 2.338429 s. The performance evaluation of multilayer perceptron neural network is given in Table 1. The confusion matrix obtained using the multilayer perceptron with "trainlm", "trainbr" and "trainscg" is shown in Figure 1.

Multilayer Perceptron
In the performance evaluation of the multi-layer perceptron (MLP), twenty neurons were used in the hidden layer. Performance evaluation of the Multilayer perceptron was done using Levenberg-Marquardt backpropagation ("trainlm"), Scaled conjugate gradient backpropagation ("trainscg") and Bayesian regularization backpropagation ("trainbr"). The average time taken in performance evaluation was 2.338429 s. The performance evaluation of multilayer perceptron neural network is given in Table 1. The confusion matrix obtained using the multilayer perceptron with "trainlm", "trainbr" and "trainscg" is shown in Figure 1.  Figure 1. The confusion matrix obtained using the multilayer perceptron with "trainlm", "trainbr" and "trainscg" training methods. Figure 1. The confusion matrix obtained using the multilayer perceptron with "trainlm", "trainbr" and "trainscg" training methods.

Radial Basis Neural Network
In the performance evaluation of RBF, fifty neurons were used in the hidden layer. Performance evaluation of the RBF was done using different levels of spread, normal spread = 1, small spread = 0.5 and large spread = 10. The average time taken in performance evaluation was 1.370280 s. The performance evaluation of the RBF network is given in Table 2. The confusion matrix obtained using the multilayer perceptron with "trainlm", "trainbr" and "trainscg" are shown in Figure 2. In the performance evaluation of RBF, fifty neurons were used in the hidden layer. Performance evaluation of the RBF was done using different levels of spread, normal spread = 1, small spread = 0.5 and large spread = 10. The average time taken in performance evaluation was 1.370280 s. The performance evaluation of the RBF network is given in Table 2. The confusion matrix obtained using the multilayer perceptron with "trainlm", "trainbr" and "trainscg" are shown in Figure 2.

General Regression Neural Network
In the performance evaluation of the General Regression Neural Network (GRNN), fifty neurons were used in the hidden layer. Performance evaluation of the GRNN was done using different levels of spread, normal spread = 1, small spread = 0.5 and large spread = 5. The average time taken in performance evaluation was 1.109263 s. The performance evaluation of the GRNN is given in Table 3. The confusion matrix obtained using GRNN different levels of spread is shown in Figure 3.

General Regression Neural Network
In the performance evaluation of the General Regression Neural Network (GRNN), fifty neurons were used in the hidden layer. Performance evaluation of the GRNN was done using different levels of spread, normal spread = 1, small spread = 0.5 and large spread = 5. The average time taken in performance evaluation was 1.109263 s. The performance evaluation of the GRNN is given in Table 3. The confusion matrix obtained using GRNN different levels of spread is shown in Figure 3.

Support Vector Machine
In an evaluation of performance of the SVM, Linear, Polynomial and RBF kernel function were used to map the training data into kernel space. The average time taken in performance evaluation was 2.234734 s. The performance evaluation of the Support Vector Machine is given in Table 4. The confusion matrix obtained using the SVM with "linear", "quadratic" and "rbf" kernels are shown in Figure 4.

Support Vector Machine
In an evaluation of performance of the SVM, Linear, Polynomial and RBF kernel function were used to map the training data into kernel space. The average time taken in performance evaluation was 2.234734 s. The performance evaluation of the Support Vector Machine is given in Table 4. The confusion matrix obtained using the SVM with "linear", "quadratic" and "rbf" kernels are shown in Figure 4.  . The confusion matrix obtained using the SVM with "linear", "quadratic" and "rbf" kernels.

Random Forest
In the performance evaluation of the Random Forest method, a set of a hundred classification decision trees was used as base learners for the Random forest. The maximum split or each tree was fixed at five times. The ensemble of classification trees used LPBoost, AdaBoostM1 and bag methods. The average time taken in performance evaluation was 2.554221 s. The performance evaluation of the Random forest method is given in Table 5.

Support Vector Machine
In an evaluation of performance of the SVM, Linear, Polynomial and RBF kernel function were used to map the training data into kernel space. The average time taken in performance evaluation was 2.234734 s. The performance evaluation of the Support Vector Machine is given in Table 4. The confusion matrix obtained using the SVM with "linear", "quadratic" and "rbf" kernels are shown in Figure 4.  Figure 4. The confusion matrix obtained using the SVM with "linear", "quadratic" and "rbf" kernels.

Random Forest
In the performance evaluation of the Random Forest method, a set of a hundred classification decision trees was used as base learners for the Random forest. The maximum split or each tree was fixed at five times. The ensemble of classification trees used LPBoost, AdaBoostM1 and bag methods. The average time taken in performance evaluation was 2.554221 s. The performance evaluation of the Random forest method is given in Table 5. . The confusion matrix obtained using the SVM with "linear", "quadratic" and "rbf" kernels.

Random Forest
In the performance evaluation of the Random Forest method, a set of a hundred classification decision trees was used as base learners for the Random forest. The maximum split or each tree was fixed at five times. The ensemble of classification trees used LPBoost, AdaBoostM1 and bag methods. The average time taken in performance evaluation was 2.554221 s. The performance evaluation of the Random forest method is given in Table 5. The confusion matrix obtained using "LPBoost", "GentleBoost", "AdaBoostM1", "Robust-Boost", "TotalBoost", "RUSBoost" and "Bag" ensemble methods in the Random forest are shown in Figure 5. The confusion matrix obtained using "LPBoost", "GentleBoost", "AdaBoostM1", "Ro-bustBoost", "TotalBoost", "RUSBoost" and "Bag" ensemble methods in the Random forest are shown in Figure 5.

Discussion
All the described and implemented machine learning techniques showed a high accuracy and precision and a smaller amount of error sensitivity or recalls in the prediction of the severity of COVID-19 patients. The best performance in all the applied methods is given by with Random forest method with the ensemble of decision trees using "LPBoost" "GentleBoost" "AdaBoostM1". The next best performance was recorded by the multilayer perceptron neural network training with Bayesian regularization algorithms. The performance of RBF showed poorest results, especially in the measure of sensitivity. The performance of the GRNN and SVM with different parameters was fair. The performance measures of the machine learning techniques show the potential for accurate assessment of severity for COVID-19-infected patients. The present research demonstrates that machine learning can be an effective method to assess the severity of the patient and the risk of death precisely and quickly by using patient history and the laboratory findings which would enable patients to be sorted and treated accordingly to receive appropriate clinical management with optimum use of medical facilities. The random forest method proved to be the most efficient and reliable machine learning technique for severity prediction.

Materials and Methods
The experiment was performed on a computer with a six core Core i7 (7th Gen.) processor, 16 GB RAM, and 8 GB graphics memory of a NVIDIA GeForce GTX-1070 GPU. MATLAB R2017a (64-bit). The Materials and Methods include a short description of the data sets, details on the feature selection process and a brief introduction to the various machine learning techniques and training algorithms used in the prediction of severity.

Dataset
All the proposed machine learning methodologies are implemented with the data obtained from the research article published by Zhenhuan Cao et al. [13] on page 6, "S1 data". The obtained clinical data were reviewed by a team of physicians at Peking University Clinical Research Institute. The data set information includes demographic data, signs and symptoms, medical history and treatment measures of the COVID-infected patients. The data sets consist of 52 features. The clinical features include patient physical properties, Sign and Symptoms, Admission time frame, chronic medical illness, Chronic obstructive pulmonary disease, Oxygen saturation, Systolic blood pressure, Smoking history, Pulse, Respiratory rate, Pneumonia manifestations, various blood properties and treatment measures. The performance of machine learning models depends upon the quality of data. Preprocessing was conducted on the available data sets which included handling the missing values and normalization.

Feature Selection
Several efforts have been made to establish correlations between clinical characteristics and medical complexities with the severity of illness. There are hundreds of clinical characteristics associated with medical complexities that can influence the severity of illness. Therefore, advanced feature selection techniques are required to identify those features which contribute most to the severity of illness in which we are interested, as having irrelevant clinical features increases the chances of error in prediction significantly. A number of methods for feature selection are available in the literature. G. Chandrashekar and F. Sahin concluded in their survey [14] that a feature selection algorithm can be designed based on Filter, Wrapper and Embedded methods and can be selected based on the number of reduced features, classification accuracy, simplicity, stability, storage and computational requirements. Relief-based algorithms (RBAs) developed by Kira and Rendell in 1992 [8], are one of the most prominent feature selection algorithms in the area of machine learning. While many feature selection algorithms assume conditional independence of the clinical features for estimating the quality of the clinical feature, Relief algorithms do not make this assumption. Relief algorithms can efficiently identify the relevant feature from the set of clinical features which contribute to the severity of illness and have strong dependencies between them. The feature selection was adopted on the training data and testing data were left aside for the classification. Relief algorithms compute the weights of the applied 52 features or input variables according to their contribution in critical illness of the infected patient as shown in Figures 6 and 7. Selected 32 clinical features ranked based on their importance in the prediction of the severity are shown in Figure 8. The discarded 20 clinical features as they have a negative contribution in the prediction process are shown in Figure 9.
ence of the clinical features for estimating the quality of the clinical feature, Relief algorithms do not make this assumption. Relief algorithms can efficiently identify the relevant feature from the set of clinical features which contribute to the severity of illness and have strong dependencies between them. The feature selection was adopted on the training data and testing data were left aside for the classification. Relief algorithms compute the weights of the applied 52 features or input variables according to their contribution in critical illness of the infected patient as shown in Figure 6 and Figure 7. Selected 32 clinical features ranked based on their importance in the prediction of the severity are shown in Figure 8. The discarded 20 clinical features as they have a negative contribution in the prediction process are shown in Figure 9.   ence of the clinical features for estimating the quality of the clinical feature, Relief algorithms do not make this assumption. Relief algorithms can efficiently identify the relevant feature from the set of clinical features which contribute to the severity of illness and have strong dependencies between them. The feature selection was adopted on the training data and testing data were left aside for the classification. Relief algorithms compute the weights of the applied 52 features or input variables according to their contribution in critical illness of the infected patient as shown in Figure 6 and Figure 7. Selected 32 clinical features ranked based on their importance in the prediction of the severity are shown in Figure 8. The discarded 20 clinical features as they have a negative contribution in the prediction process are shown in Figure 9.

Machine Learning Methodologies
Three types of neural network architecture, Support vector regression, which is a kernel method; and Random forest regression, which is based on decision trees, are implemented in the present research. The brief methodology, working procedure, and the difference are given below.

Multilayer Perceptron
An artificial neural network is a machine learning architecture or model inspired by the human brain. The neural network is a combination of nodes or neurons. The most popular neural network is the feed-forward neural network or multi-layer perceptron (MLP) neural network [15]. Both architectures names are used interchangeably and widely used in neural network studies. Every layer is connected in series with nodes or neurons from input to output in multi-layer perceptron (MLP). The neural network comprises three types of layers, the input layer, the hidden layer and finally an output layer. The number of hidden layers can be manipulated, and the complexity of the neural network architecture increases with the increase in the number hidden layers and the number of neurons in the hidden layer.
The inputs are given in the first layer (input layer) and the outputs are at the last layer (output layer). The hidden layer has two functions; first is an aggregation function and

Machine Learning Methodologies
Three types of neural network architecture, Support vector regression, which is a kernel method; and Random forest regression, which is based on decision trees, are implemented in the present research. The brief methodology, working procedure, and the difference are given below.

Multilayer Perceptron
An artificial neural network is a machine learning architecture or model inspired by the human brain. The neural network is a combination of nodes or neurons. The most popular neural network is the feed-forward neural network or multi-layer perceptron (MLP) neural network [15]. Both architectures names are used interchangeably and widely used in neural network studies. Every layer is connected in series with nodes or neurons from input to output in multi-layer perceptron (MLP). The neural network comprises three types of layers, the input layer, the hidden layer and finally an output layer. The number of hidden layers can be manipulated, and the complexity of the neural network architecture increases with the increase in the number hidden layers and the number of neurons in the hidden layer.
The inputs are given in the first layer (input layer) and the outputs are at the last layer (output layer). The hidden layer has two functions; first is an aggregation function and another is an activation function. The aggregation function calculates the weighted arithmetic mean of the received input from the input layer using initial/update weights and bias. After that, the activation function is applied on that aggregated net. Activation functions could be linear and nonlinear (i.e., threshold, tan-sigmoid, log sigmoid etc.) depending on the application of neural network. However, most of time in the multi-layer perceptron (MLP) nonlinear activation functions are used because, from the nonlinear value, a deterministic or binary value can be achieved using various methods (such as maximum likelihood criteria). The information flows in a forward direction in the feed forwarded network and error correction is done by a backpropagation algorithm. The method works iteratively to minimize the error and optimized weight and bias.
Computers 2021, 10, 31 10 of 14 Y k is an output vector T k is a target vector. Φ is an activation function. W 0 is a bias. W i are a weight vector. X i are an input vector. U 0 is a bias of the hidden layer. U i is a weight vector of the hidden layer.
H is an output of the hidden layer. H i is an input vector of the hidden layer. Y is a final output.
In the artificial neural network, the error function is used to minimize the error using back-propagation. The target vector is subtracted from the output vector. In addition, the weight and bias are updated. The output of the MLP is calculated by the sum of the weight vector of the hidden layer and the input vector of the hidden layer with the bias of the hidden layers. This is done after using the linear or nonlinear activation function. Outputs of the hidden layers are calculated by the summation of the Input weight vector and the input vector with the linear or nonlinear activation function.

Radial Basis Function Network
The radial basis function, which is a Gaussian distribution-based kernel function, is used in a neural network to predict the output [16]. It is a nonlinear function used as an activation function in the neural network. RBF NN has three layers. The first layer is an input layer. The second is a hidden layer with a nonlinear RBF activation function. The last layer is an output layer which is linear. Basically, RBF converts the input matrix into the Gaussian form.
V(Xn) = e (−Xn 2 /2) This function can be shifted to an arbitrary Centre, X = a where "a" is the mean of the input vector and with standard deviation "σ" Here V (Xn) is the input matrix of the features. Then, we calculate the output of the hidden layer and then calculate the final output.
Φ is an activation function. W 0 is a bias. W i is a weight vector. X i is an input vector. U 0 is a bias of the hidden layer. U i is a weight vector of the hidden layer.
H is an output of the hidden layer. H i is an input vector of the hidden layer. Y is a final output. The output of the hidden neuron is calculated based on the distance between the Centre of each activation or basis function and the input of each hidden neuron.

General Regression Neural Network
The General Regression Neural Network (GRNN) is a relatively newly-defined architecture of a neural network [17]. In the GRNN, a nonlinear arbitrary function is optimized as a relation between output and input data through training examples. In the GRNN, an iterative training procedure is not required like in the multilayer perceptron (MLP). The prediction of the continuous variable is the best suited case to the GRNN. Since the GRNN is a variation of an RBF neural network, it also uses kernel regression. Prediction of the dependent variable occurs from the independent variable of training examples. The GRNN minimizes the RMSE to zero and it is one of the merits of GRNN. It estimates the joint PDF of the independent and dependent variables from the training examples. The joint PDF is made from the training data, so no presumption is followed, and this makes it more general. It also gives the condition that, if the training set becomes larger, the error approaches to zero.
Y is a predict output vector. X is an input vector. E[y/x] is an expected value of output. F(x, y) is the joint density function of x and y.
w nm is a weight vector. h n is an output of hidden layer.
h n = e −Dn 2.spread 2 (10) X is an input vector. K is the training vector. Spread is a constant that are used to control the size of the receptive region.

Support Vector Machine
The Support Vector Machine (SVM) was introduced by Vapnik et al. [18] in 1997, and was based on statistical learning theory. In the SVM, the goal is to find an approximation function that tries to predict the dependent output variable (Y) on the basis of independent input variables (X) such that the deviation from the predicted output value and target value is less than or equal to an error measure until the error is tolerated (ε). We assumed an approximation function f (x) which tries to predict the output (Y) of the model. For simplicity, we can take a linear basis function given as In this case, ascertaining that the approximation function provides a good prediction; the norm of w, w 2 = w, w , should be minimized. Then the cost function to optimize will be min 1 2 w 2 (13)