Towards Machine Learning Algorithms in Predicting the Clinical Evolution of Patients Diagnosed with COVID-19

Andrade, Evandro Carvalho de; Pinheiro, Plácido Rogerio; Barros, Ana Luiza Bessa de Paula; Nunes, Luciano Comin; Pinheiro, Luana Ibiapina C. C.; Pinheiro, Pedro Gabriel Calíope Dantas; Holanda Filho, Raimir

doi:10.3390/app12188939

Open AccessArticle

Towards Machine Learning Algorithms in Predicting the Clinical Evolution of Patients Diagnosed with COVID-19

by

Evandro Carvalho de Andrade

¹,

Plácido Rogerio Pinheiro

^1,2,*

,

Ana Luiza Bessa de Paula Barros

^1,*

,

Luciano Comin Nunes

³

,

Luana Ibiapina C. C. Pinheiro

^1,*,

Pedro Gabriel Calíope Dantas Pinheiro

¹ and

Raimir Holanda Filho

²

¹

Graduate Program in Computer Science-PPGCC, UECE State University of Ceará, Fortaleza 60714-903, CE, Brazil

²

Graduate Program in Applied Informatics, PPGIA, University of Fortaleza, Fortaleza 60811-905, CE, Brazil

³

University Center September 7, Fortaleza 60811-020, CE, Brazil

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(18), 8939; https://doi.org/10.3390/app12188939

Submission received: 7 August 2022 / Revised: 28 August 2022 / Accepted: 30 August 2022 / Published: 6 September 2022

(This article belongs to the Special Issue Applied Machine Learning Ⅱ)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Predictive modelling strategies can optimise the clinical diagnostic process by identifying patterns among various symptoms and risk factors, such as those presented in cases of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as coronavirus (COVID-19). In this context, the present research proposes a comparative analysis using benchmarking techniques to evaluate and validate the performance of some classification algorithms applied to the same dataset, which contains information collected from patients diagnosed with COVID-19, registered in the Influenza Epidemiological Surveillance System (SIVEP). With this approach, 30,000 cases were analysed during the training and testing phase of the prediction models. This work proposes a comparative approach of machine learning algorithms (ML), working on the knowledge discovery task to predict clinical evolution in patients diagnosed with COVID-19. Our experiments show, through appropriate metrics, that the clinical evolution classification process of patients diagnosed with COVID-19 using the Multilayer Perceptron algorithm performs well against other ML algorithms. Its use has significant consequences for vital prognosis and agility in measures used in the first consultations in hospitals.

Keywords:

machine learning; COVID-19; prediction; machine learning; medical diagnosis optimisation

1. Introduction

The COVID-19 pandemic has spread worldwide since the first cases were reported in China in December 2019 [1]. Since then, more than 546 million cases of COVID-19 have been reported, with features of severe acute respiratory syndrome due to SARS-CoV-2. Globally, the number of weekly COVID-19 cases increased for the third week, during 20–26 June 2022. COVID-19 variants such as Delta and Omicron are putting hundreds of thousands of people at risk, especially those with weakened immune systems. With the increasing spread of COVID-19, different ways to identify COVID-19 infection using deep learning (DL) methods are widely used to track the spread of the virus [2]. Symptom association activities and epidemiological and treatment recommendations for status alerts can utilise machine earning (ML) capabilities and deep learning (DL) approaches to optimise the correct interpretation of diagnoses, analysis of medical exam imaging treatments, and possible sequelae left by the infection [3].

Predictive models can identify and classify patterns and predict outcomes based on the analysed data. By applying its techniques to structured and unstructured data, a predictive model can lead to more realistic decision-making through relevant criteria and evaluation of various attributes (characteristics), such as the symptoms of a specific disease. According to Andrew Moore of Carnegie Mellon University School of Computer Science, artificial intelligence (AI) models look for computational devices to simulate the human ability to think and solve problems [4,5]. In addition, artificial intelligence can help guide medical analyses, aiming to assess and understand the characteristics of various symptoms such as those existing in COVID-19 cases. With the current crisis, the capacity of healthcare professionals has been challenged. The interpretation of tests to obtain diagnoses and prognoses during the waves of COVID-19 required hard work that was limited by experience, speed, and fatigue. In specific healthcare settings, such as intensive care services or in health crises such as COVID-19, professionals may experience high levels of compassion fatigue (CF), and their quality of work-life (ProQoL) may be impaired [6]. These healthcare workers exposed to COVID-19 are at high risk of developing mental health issues, including anxiety, depression, and stress, so may need psychological support or interventions to help them manage their situation [7]. Professionals who care for patients with COVID-19 have higher levels of HR and burnout (BO) than those who work in other healthcare settings [8,9].

The ability of machines to perform complex tasks and make decisions independently can help these professionals to be more efficient in investigating the case and implementing treatments in the first days of symptoms. Symptom association activities, epidemiological recommendations and status alert treatment can use AI resources to optimise the correct interpretation of diagnoses, treatments, and possible sequelae left by infection [10]. The AI-based methods employed to identify, classify, and diagnose medical images have significantly improved the screening, diagnosis, and prediction of COVID-19, resulting in superior scale-up, timely response, and more reliable and efficient results and occasionally outperforming humans in certain health activities [11]. Choosing the correct artificial intelligence (AI) algorithm for a specific problem is not a trivial task. The definition of which one will be applied to a dataset to perform a predictive analysis is decisive in the quality of forecasts and selecting strategies related to the desired objective. The health area requires the control of many stages, which are highly variable and depend on other stages of the patient’s treatment. A predictive and centralised command and control system is needed to manage this variation, thus dealing with complex data, continuously learning from its experience, and improving the algorithms used in clinical predictions [12]. This study aims to evaluate the feasibility of using different ML techniques by applying predictive models to classify the clinical course of COVID-19 cases. Some metrics were used to measure the performance of the following algorithms: K-Nearest Neighbor (KNN), Naive Bayes (NB), Decision Trees (DT), Multilayer Perceptron (MLP), and Support Vector Machine (SVM). Once a comparative benchmark has been established between the different classification algorithms, showing which one has the best effectiveness, through the problem proposed in this study, the clinical evolution of patients with different symptoms of COVID-19 can be safely predicted.

For the learning process of the model, 129,475 cases of patients with COVID-19 were registered by the Epidemiological Surveillance of state and municipal bodies in the Epidemiological Surveillance System of the municipality (SIVEP-Influenza) were analysed until March 2021. The health area requires the control of many stages, which are highly variable and dependent on other stages of a patient’s treatment. A predictive and centralised command and control system is needed to manage this variation, thus dealing with complex data, continuously learning from its experience, and improving the algorithms used in clinical predictions [4]. This study aims to evaluate the feasibility of using different ML techniques by applying predictive models to classify the clinical course of COVID-19 cases. Some metrics were used to measure the performance of the following algorithms: K-Nearest Neighbor (KNN), Naive Bayes, Decision Trees, Multilayer Perceptron (MLP), and Support Vector Machine (SVM). Once a comparative benchmark has been established between the different classification algorithms, showing which one has the best effectiveness, through the problem proposed in this study, the clinical evolution of patients with different symptoms of COVID-19 can be safely predicted.

This article is organised as follows. First, Section 2 highlights the concepts of COVID-19 and machine learning. Then, in Section 3, the methodology was used in this study. In Section 4, the performance of the ML models is evaluated. Section 5 presents the comparative benchmark between the best values obtained from each prediction model. Finally, Section 6 exposes the target class prediction process. Section 6 summarises the conclusions, future work, and research limitations.

2. Background and Research

This research’s theoretical framework and related work were structured into three topics: highlighted aspects of COVID-19, machine learning, and proposals for AI solutions used to diagnose and predict the disease’s clinical evolution.

2.1. Highlights of COVID-19 Pandemic Concepts

COVID-19 has a wide spectrum, from superficial asymptomatic infection to severe pneumonia with acute respiratory distress syndrome (ARDS) and death. Anyone with consistent symptoms should be tested for SARS-CoV-2 infection [6]. Of 373,883 reported cases in the United States, 70% of the patients experienced fever, cough, and shortness of breath, 36% had muscle pain, and 34% reported headaches. Other reported symptoms include, but are not limited to, diarrhoea, dizziness, sore throat, abdominal pain, anorexia, and vomiting. SARS-CoV-2 was first identified in Wuhan, China, in December 2019. In Brazil, the first case was recorded in February 2020 in São Paulo [10]. COVID-19 has a significantly higher mortality rate than common influenza, and its transmission rate is higher than in recent epidemics such as SARS-CoV and H1N1 [11]. Sanitary measures to stop disease transmission have impacted the global socioeconomic scenario [12].

Brazil is currently the third country in the world in the total number of cases, behind only the United States and India. Furthermore, it ranks second in COVID-19 deaths [13].

2.2. Exposure of Healthcare Professionals to the COVID-19 Pandemic

The COVID-19 pandemic has exposed healthcare workers and new work-related problems [1]. Daily exposure to pandemic challenges can cause a risk described in the waiting context as a challenge (CF), such as burnout and secondary trauma (ST) [4,14].

Disease outbreaks provoke an intense response from the medical team, and fatigue, due to this challenge, has a significant impact on the mental health of health professionals, generally causing less vigilance and cognitive loss [15]. In the same way, stress with other psychological implications during the pandemic is considered to cause insomnia. Previous research on SARS identified poor sleep quality in nurses caring for patients with SARS [16].

In turn, isolation from loved ones, colleagues, and people with whom they used to have ties, the demand for long working hours, virus transmission in the workplace, and ethical concerns directly affect the physical and mental well-being of professionals [17]. Being in contact with the virus or feeling fear in day-to-day work can trigger more significant symptoms [18].

A systematic review found that many healthcare professionals experience significant anxiety, depression, and insomnia levels during the COVID-19 outbreak. A high proportion of healthcare professionals reported mild symptoms of both depression and anxiety [7,19].

Technology can reduce unnecessary visits, decrease healthcare workers’ risk of infection, reduce their workload, and optimise their time to care for patients with acute conditions [20]. Artificial intelligence technology can also be applied to monitor the mental health of professionals, for example, to recognise people and medical teams at risk of suicide or other crises through psychological messages and necessary alarms [21,22].

It seeks to find patterns and make predictions [6]. Predictive models support the decision-making process, simplify the analysis of a problem and its alternatives, and, therefore, justify the choice of a particular action [7]. Another approach, from the point of view of decision-making based on verbal factors, is predictive ML models associated with a multi-criteria decision-support method of verbal decision analysis [8]. Based on machine learning algorithms and techniques, predictive models use mathematical calculations in datasets, according to a specific scenario and needs, to highlight patterns capable of highlighting trends or determining possible clinical diagnoses through statistics and probability [10,23] and, in short, extracting the valuable information stored in historical data to predict and decide the best actions. In addition to healthcare, other organisations are performing predictive analytics to solve complex problems and gain insights. One can mention the analysis of credit risk, finance, improvement in marketing actions, and supply and demand management.

AI predictive models can be supervised when the input data is known (labelled) or unsupervised when the ML algorithm is not telling the input data’s meaning. The input data for this study are labelled. Thus, supervised models were applied to classify the data. The models aim to identify the class (target) an object (characteristics) belongs by mapping the input variables into distinct categories, such as the most important characteristics (clinical symptoms), thus determining the group or class to which the data belongs within the business context. Figure 1 presents the evaluation metrics used in the data classification task that are analysed in this study.

2.3. Applying Artificial Intelligence to Pandemic Data

This section of the theoretical review describes the application of ML models used in data from patients diagnosed or suspected of having COVID-19, following a technological and practical approach to AI. ML models are applied to large amounts of data to obtain pattern detection of the information related to COVID-19 [24,25]. In this context, ML models played an important role in combating the COVID-19 pandemic [26]. Furthermore, studies propose a system with artificial intelligence to improve the ability to define the diagnosis more quickly in patients with COVID-19 [14,27]. Similarly, ML models are used to predict the prognosis of patients diagnosed with SARS-CoV-2 [2,28] and are used to analyse risk factors and predict mortality among patients in the ICU with COVID-19 [29]. In addition, the continuous development of AI is an effective tool for treating the COVID-19 pandemic and has reduced human intervention in medical practice [16]. Moreover, ML solutions can combat the chaos of the pandemic and help define the prognosis [17]. In a similar approach, deep learning is used in the initial screening of patients diagnosed with COVID-19 [18]. Some AI techniques are used to analyse blood tests and CT images to develop diagnostic and prognostic models of COVID-19 [19,30].

However, from the perspective of intelligent systems, ML algorithms have been used to predict intelligent physiological deterioration and death in patients diagnosed with COVID-19 [20]. ML models help analyse early mortality prediction in critically ill patients [31,32]. The K-means clustering method has also been used to provide input to the Indonesian government against the spread of COVID-19 [33].

Similarly, ML models are essential for clinical decision support to predict severity risk and the screening of patients with COVID-19 at hospital admission [34]. Studies have reported crucial symptoms such as dyspnea, cough, and fever to define the clinical course of patients with COVID-19. These resources are used as input to develop ML-based models and predict diagnostic results in patients with SARS-CoV-2 [35,36].

3. Methodology

The methodology used in this study is based on the execution of four steps, regardless of the ML method to be used.

Step 01: Data collection and measurement (selection);
Step 02: Data pre-processing;
Step 03: Model execution (transformation/mining);
Step 04: Validation of the results (interpretation/knowledge).

Figure 2 presents the execution flow to obtain knowledge through data and metrics collection, preprocessing, execution of the ML model, and final validation of the results.

This research used the collaborative environment, “Colab” (Collaboratory), a product of Google Research, based on the open-source project Jupyter. The sections used have access to a processor with two cores, 12 GBytes of RAM, and an L3 cache of 40–50 Mbytes. Furthermore, Google Colab is a free cloud service hosted by Google to encourage machine learning and artificial intelligence research. This environment is widely used to run Python code with machine learning libraries and tools.

In each of the four stages, the libraries NumPy, Pandas, Seaborn, and Scikit-learn were used to manipulate and analyse the data, generate calculations and statistical plot graphs, and apply the practice of ML methods. Specifically, NumPy is the foundational package for scientific computing in Python, and Pandas provides tools for data analysis and manipulation [37,38]. Seaborn is used for plotting statistics, and Scikit-learn is a machine learning library that supports supervised and unsupervised learning [39,40]. These are some of the main Python libraries [41].

3.1. Data Collection and Measurement

It is essential to obtain a satisfactory result, as the quality of the collected data affects how they will be processed and interpreted through the evaluation metrics.

3.1.1. Data Collection

Health data science, also known as the solution based on data science in health, can transform the reality of professionals in this area. The focus is on applying artificial intelligence and ML algorithms to interpret and understand patient data and generate clinical predictions [3,30].

The present study collected the data from 129,475 COVID-19 patients registered in the system developed by the Health Surveillance Secretariat (SVS) Ministry of Health of Brazil, SIVEP-Gripe. This system incorporated 2020-specific information about COVID-19 and, from there, information such as the date of onset of symptoms, date of death, date of hospitalisation, associated risk factors, age, sex, date of exam collection, and status of exams, among others, of hospitalised cases of a severe acute respiratory syndrome (SARS) by COVID-19. Therefore, improve the records of SARS deaths confirmed by COVID-19 in the SIVEP-Influenza system.

3.1.2. Data Dictionary

Table 1 describes the attributes used in this study.

The data fields FEVER, COUGH, DYSPNEA, THROAT, PAIN_ABD, FATIGUE, DIARRHEA, SATURATION, VOMIT, PERD_OLFT (loss of smell), and LOST_PALA (loss of taste) store the respective signs and symptoms of the patient, according to codes 1—Yes (if) the patient presented the sign/symptom), 2—No (if the patient did not present the sign/symptom), or 0/9—Ignored (if the presence of the sign/symptom is unknown).

In addition, the RISC_FACTOR data field records the patient’s risk factors for worsening the disease. It is filled with codes 1—Yes or 2—No, depending on the existence or not of the risk factor, and 0/9—Ignored (if the presence of the risk factor is unknown). The BMI must be specified for the risk factor Obesity registered in the OBESITY field if code 1—Yes is marked for “Obesity”.

Information on the flu vaccine is registered in the VACCINE field. In this, data are obtained if the patient received the flu vaccine in the last flu vaccination campaign carried out in Brazil. It is filled in with the corresponding code, 1—Yes or 2—Patient’s use of ventilatory support, which is recorded in the SUPPORT_VEN data field. It contains information on whether the patient used ventilatory support, with the corresponding code: 1—Yes, invasive (he used a ventilation technique with the patient with prostheses and endotracheal tubes that work as a patient/ventilatory support interface); 2—Yes, non-invasive (the patient used a ventilation technique in which a mask or similar device works as a patient/ventilatory support interface, without the use of prostheses and endotracheal tubes); and 3—No (the patient did not use ventilatory support).

Finally, recording the evolution of the case, the EVOLUTION data field, where the corresponding code of the patient’s clinical evolution is found: 1—Cure, 2—Death, and 3—Death by causes. See the clinical evolution of the case for the unknown code 9—Ignored is used.

3.1.3. Data Measurement

At this stage, we seek to understand the collected data, identifying trends, and patterns to be mapped. A descriptive statistical analysis was used to understand, summarise, and describe the essential aspects of the set of observed characteristics of cases with the diagnosis of COVID-19.

Through the matrix presented in Figure 3, the correlation between attributes/symptoms is evaluated. Table 2 shows the highest correlation coefficients for the attribute pairs, demonstrating a linear relationship. The values indicate a moderately positive relationship. That is, as one attribute increases, the other attribute also increases. Taking as an example the pair with the highest correlation (0.96), when PERD_OLFT (loss of smell) increases from 1—Yes to 2—No, PERD_PALA (loss of taste) tends to increase. Furthermore, it is observed that when there is an improvement in the symptom of loss of smell, there may also be an improvement in the patient’s taste.

Table 3 presents a negative linear relationship between the attributes EVOLUCAO and SUPPORT_VEN. Indeed, as the value of EVOLUTION goes from 1—Cure to 2—Death, the value of SUPPORT_VEN decreases from 2—Use of noninvasive ventilatory support) to 1—Use of invasive ventilatory support), thus demonstrating a relationship between invasive ventilatory support and death cases. In this study, attributes with high correlation were used, as they have the most significant predictive capacity (sign), and irrelevant variables (low correlation) were excluded.

3.2. Data Preprocessing

Data preprocessing is the process of preparing, organising, and structuring data. In this stage, techniques are used to extract knowledge, which is determined by the quality of the input data (collected data). Based on AI algorithms, data mining techniques can also help in data selection, preprocessing, and transformation, by discovering patterns and generating knowledge through their interpretations [21].

In this study, the values analysed for the EVOLUTION attribute are 1.0—Cure and 2.0—Death). The value “1.0—Cure” was considered “hospital discharge”. Cases without clinical evolution records and records with death from other causes were excluded from the preprocessing, leaving 60,992 cases of patients with COVID-19.

3.2.1. Definition of Input Data

In this phase, the division between the desired attributes of the other features of the dataframe is carried out. Thus, the input data to be used by the prediction models are defined. The input data are denoted as X when determining the output class (target). The target class, y, is the attribute that wants to predict the output value. The prediction models will use the input data (X) to predict the clinical course (y) of the COVID-19 cases.

3.2.2. Training and Test Data

An ML prediction model is based on the observation of data. Once the learning is complete, it can perform complex tasks and make predictions with greater precision [3]. For this reason, we divide the input data into training and testing data. This segmentation aims to acquire knowledge to simulate forecasts and evaluate their performance. In this study, the Pareto proportion was used, where 80% (48,793 cases) were used for training and 20% (12,199 cases) were used for tests.

3.3. Model Execution

At this stage, the training base is submitted to the prediction models. Its parameters are optimised according to the data presented. In a second moment, the prediction of the target class was performed: the test data were applied to the trained models, and, finally, the predictions of the clinical evolution were obtained (1—Cure or 2—Death).

3.4. Validation of Results

The evaluation metric used to determine the best ML model depends on the analysed problem [42]. Metrics applied to health problems, such as the accuracy of a diagnosis, mean the ability of a prediction to discriminate between the target class and the patient’s actual prognosis. For more critical qualifications in predicting clinical evolution, the diagnostic accuracy measures presented in Table 4 were used for the two models of this study.

Table 4. Evaluation metrics.

Evaluation Metrics
Accuracy	Defines the overall performance of the model [43].
Precision	Indicates whether the model is accurate in its classifications [44].
Recall	Is the number of samples classified as belonging to a class divided by the total number of samples belonging to it, even if classified in another [44].
F1 score	Indicates the overall quality of the model [44].
Area Under the Curve (AUC)	Measures the area under the curve formed between the rate of positive examples and false positives [45].

4. Results and Discussion

Furthermore, the definition of the attributes to be used and the hyperparameters of each algorithm were adjusted to obtain the best behaviour for the proposed problem. Then, the performance of each one of them was evaluated through competitive benchmarking. Metrics indicative of performance and precision were used, thus obtaining the model’s ability to learn by demonstrating a satisfactory result to perform in an authentic context.

4.1. KNN Results

The first model applied was K-Nearest Neighbor (KNN). For this method, analyses were performed, as shown in Table 5. The result obtained using three different K values (5, 25, and 45) is observed. The distance validation metrics were analysed for each K value: Euclidean, Manhattan, and Hamming.

The value of K equal to 45, as shown in Figure 4, using the Hamming distance measure, was the configuration with the best overall performance (accuracy), obtaining 75.27% confidence in the estimate. Its execution time was 69.995 s, the fastest execution using Euclidean distance and K equal to 5, taking 7.132 s, but it was the least accurate among the analysed K values.

An accuracy of 74.93% was obtained according to the parameters, indicating the percentage of correct classification of clinical evolution. Recall (sensitivity), which indicates the frequency of the correct classification, obtained a value of 0.9322. KNN presents a good frequency of assertiveness (optimal value is equal to 1) when classifying the evolution of the clinical case as high: hospital or death. When combining the values obtained from precision and recall, we have an F1 score of 0.83082 (optimal value equal to 1). The value reached corroborates the information on the precision and recall values. AUC was performed for all K values, and the respective distances were investigated. Figure 5 shows the graph generated through the best parameters identified, reaching a value of 0.76 (ROC score). With this information, a patient chosen randomly is evaluated, with 76% assertiveness in classifying their clinical evolution, using the K-Nearest Neighbor (KNN) prediction model.

4.2. Naive Bayes Results

Then, the Naive Bayes model was applied to the training data sample. Table 6 presents the results obtained using the following distributions: Gaussian, Bernoulli, and Multinomial.

Moreover, with accuracy reaching 66.62%, the specific instance of the Naive Bayes classifier, using the Multinomial distribution, stood out from the others that were analysed. The Accuracy metric, used to investigate whether the model is accurate in its classifications, obtained 66.98%. Sensitivity or recall, indicative of assertiveness frequently in the classification of the patient’s clinical evolution, obtained a value of 0.9618, considering the optimum equal to 1. This was a significant result. The harmonic means between precision and recall (F1 score) resulted in 0.78966 for the model’s overall quality.

The area under the ROC curve or Area Under the ROC curve (AUC) was verified for all Naive Bayes distributions, as shown in Figure 6. The highest value reached was 0.64 for the multinomial distribution, which means a 64% chance of correctly classifying clinical evolution using the multinomial naive Bayes.

4.3. Results of the Decision Trees

The Decision Tree model was also applied to the same training data. Table 7 presents the result obtained. The parameters Gini and entropy index were used in its analysis.

The best accuracy of 71.83% was obtained using entropy. This parameter defines how to measure the purity of each subset in each decision tree. In other words, it measures the probability of obtaining an occurrence of a positive event (hospital discharge) from a random selection of the data subset. It is observed that the precision obtained was 74.87%, the sensitivity (recall) was 0.8544, and the harmonic mean between these two variables (F1 score) was 0.79808.

The value reached for the ROC curve analysis was 0.69 (with the entropy parameter), as shown in Figure 7. There is a 69% chance of correctly classifying the patient’s clinical evolution using the Decision Tree model prediction.

4.4. Multilayer Perceptron Results

Applying the Multilayer Perceptron (MLP) model, the values shown in Table 8 were obtained using different learning rates and momentum.

Using MLP, with learning rate parameters equal to adaptive and momentum equal to 0.9, an accuracy of 76.3% was obtained. The precision was 76.41%, and the sensitivity was 0.76466.

The F1 score reached the value of 0.76441, the general quality of the model. Following the ROC analysis, the best result was 0.84 (ROC score), with a hypothesis of an 84% correct classification in clinical evolution, using the Multilayer Perceptron—MLP model. Based on the hypothesis presented in this study, the results of the analysis of the behaviour of the MLP networks trained on the COVID-19 dataset proved to be entirely satisfactory, lacking in terms of time for model training.

4.5. Results of the Support Vector Machine

The results of the fifth and last applied model, Support Vector Machine (SVM), using different values for the Kernel parameters, width, and degree, are presented in Table 9.

The parameters γ (gamma), C (cost), and degree were changed during the analysis of this algorithm, aiming for the best results for the learning of the model. During this exercise, it was noticed that as the values increased, the performance and complexity of the classifier increased.

An accuracy of 75.78% was obtained using Kernel RBF, with C equal to 3 and gamma equal to auto. This was the most performative result among the other configurations of this algorithm. A substantial compromise in the algorithm’s performance is noticed when inserting new parameters in this configuration. Its precision was 76.61%, with a sensitivity of 0.91320 and 0.83318 as the harmonic mean (F1 score). The ROC curve analysis for the parameters Kernel linear, C equal to 1, and gamma equal to scale presented the best ROC score of 0.73772, as shown in Figure 8. Using these values, a patient has a 73% chance of being correctly classified using the Support Vector Machine.

4.6. Discussion

Among all the analysed characteristics of the five algorithms, it can be observed that, according to the parameters explicitly defined to control the learning process, different metrics were obtained that indicate the result of the performance of each one of them. The understanding of data was carried out in each case by testing different configurations at the beginning of the learning process of each model. As a result, the fastest algorithm in its execution was Decision Tree using entropy, with only 0.05444 s for performance. The best precision and F1 score were seen with the Support Vector Machine (SVM) algorithm, using Kernel RBF, C equal to 3, and gamma equal to auto. The multinomial naive Bayes algorithm has the best sensitivity or recall, and the K-Nearest Neighbor (KNN) also stands out in this metric. A comparative benchmark was created between the best values obtained from each forecast model, which was analysed to summarise the data obtained. Table 10 summarises the condensed results among the metrics obtained.

Accuracy and reliability are essential in studies carried out in health [22,46]. According to the no free lunch theorem, if an algorithm outperforms another in one metric, it may lose in a different metric, depending on the problem. So, in general, there is no certainty about which algorithm is the best. However, the Multilayer Perceptron (MLP) algorithm, which is extremely fast when performing predictions, obtained the best results for the present study. Training MLP networks with backpropagation took considerable time, despite using the term momentum. Analysing the cost of recognising bold patterns in predicting clinical evolution was prioritised to obtain the best precision at the speed of diagnosis. Next, the predictions made with the winning MLP prediction model are presented.

5. Experimental Evaluation

Through the definition of an experimental process, the ability to predict the target class is analysed, based on random tests, with different attributes (symptoms) applied to the MLP model. The following parameters were used in the prediction activity of the target class (clinical evolution):

Model: Multilayer Perceptron (MLP):
Parameter 01: Learning rate = adaptive;
Parameter 02: Momentum = 0.9;
Parameter 03: Solver = SGD.

5.1. Definition of Values

Table 11 presents the questionnaire on the symptoms of COVID-19. Randomly, responses were defined for five clinical cases.

5.2. Prediction of Clinical Evolution

By applying the scikit-learn, predict (X), and predict_proba (X) methods, the prediction of the clinical evolution of the patient diagnosed with COVID-19 was obtained, as was the percentage perspective of assertiveness of this class.

The percentage of precision of 76.3% (accuracy) and 76.41% accuracy in the classification of the target class (hospital discharge or death) to obtain the prediction results is shown in Table 12. It is close between accuracy and precision, thus indicating the absence of systematic errors.

Through the symptoms of the five patients as input data not observed in the training and testing phase of the MLP model, the prediction results are observed:

In general, the model obtains a probability above 70% in the classification of the target class (hospital discharge or death);
Patient 01 has a 73% probability of being discharged from the hospital;
Patient 02 has an 84% probability of clinical evolution to death;
Patient 03 obtained a chance of 92% that their clinical case would evolve to an end;
Patient 04 has an 86% probability of clinical evolution to an end;
Patient 05 reaches an 85% probability of discharge from the hospital.

6. Conclusions and Future Works

In this step, the purpose of this study is analysed with the application of a machine learning model optimised to predict the clinical evolution in patients diagnosed with COVID-19. A case study was carried out with ML techniques to classify the clinical evolution in cases of COVID-19. Through a historical base of patients, 30,000 cases were analysed during the training and testing phase of the prediction models. A competitive benchmark was obtained, comparing the metrics aiming at a behaviour closer to reality. Among the K-Nearest Neighbor (KNN), Naive Bayes, Decision Trees, and Support Vector Machine (SVM) algorithms, the Multilayer Perceptron (MLP) obtained a more specific behaviour for this study approach, which helped with the recognition of patterns in the data not observed in the model preprocessing phase [15]. This way, the proposed objective was achieved by classifying the clinical evolution of patients diagnosed with COVID-19, by analysing their symptoms and classifying the clinical development through an optimised prediction model. The implication of the conclusion obtained by this study is to indicate, among the analysed algorithms, the one with the highest performance. In this case, the MLP has specific parameters, classifying the clinical evolution of patients diagnosed with COVID-19 through the symptoms identified in the SIVEP database. Furthermore, it was necessary to analyse different characteristics of each concurrent algorithm (K-Nearest Neighbor, Naive Bayes, Decision Trees, and Support Vector Machine), demonstrating their details and where each stands out using the same database.

Some future work suggestions are clinical data analysis using bootstrap and temporary aeries analysis for prognosis classification of COVID-19 patients; using ensemble learning to obtain the result via various ML algorithms and perform a comparative benchmark with COVID-19 patient data from other countries; and analysing the performance of some convolutional neural network algorithms as well as the Farmland Fertility algorithm [46,47], African Vultures Optimization Algorithm [48], and Artificial Gorilla Troops Optimizer [49] for the same COVID-19 datasets.

This study has potential limitations. The effect estimates in the model are based on the interventional and prospective observational studies of five predictive models of ML, to build a benchmark with the analysed data from patients diagnosed with COVID-19. For the construction of this comparative model, only the classification of evolution to death or not was analysed, without analysis of the other clinical conclusions. Overall, there are other critical limitations of machine learning models in medical applications stemming from the quality of the data, as it can mean the difference in diagnosing patients to the risk of intentional manipulation, so that the algorithm can introduce a particular bias leading doctors to wrong conclusions.

Author Contributions

Conceptualization, E.C.d.A.; Investigation, P.G.C.D.P.; Methodology, P.R.P. and A.L.B.d.P.B.; Project administration, L.C.N.; Supervision, R.H.F.; Validation, L.I.C.C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Scientific and Technological Development Council (CNPq) Grants Nos. 304272/2020-5 and 306389/2020-7.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No datasets are generated during the current study. The datasets analysed during this work are publicly available in this published article.

Acknowledgments

The authors Placido Rogerio Pinheiro and Raimir Holanda Filho would like to thank Fundação Edson Queiroz/Universidade de Fortaleza.

Conflicts of Interest

The authors have no competing interest to declare relevant to this article’s content.

References

WHO. Coronavirus Disease (COVID-19) Weekly Epidemiological Update and Weekly Operational Update. Available online: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports (accessed on 9 December 2020).
Heidari, A.; Navimipour, N.J.; Unal, M.; Toumaj, S. Machine learning applications for COVID-19 outbreak management. Neural Comput. Appl. 2022, 34, 15313–15348. [Google Scholar] [CrossRef] [PubMed]
Heidari, A.; Toumaj, S.; Navimipour, N.J.; Unal, M. A privacy-aware method for COVID-19 detection in chest CT images using lightweight deep conventional neural network and blockchain. Comput. Biol. Med. 2022, 145, 105461. [Google Scholar] [CrossRef] [PubMed]
Andrade, E.A. Hybrid Model in Machine Learning and Verbal Decision Analysis Applied to the Diagnosis of Master. Master’s Thesis, University of Fortaleza, Fortaleza, Brazil, 2020. [Google Scholar]
Souza, R.W.R.; Silva, D.S.; Passos, L.A.; Roder, M.; Santana, M.C.; Pinheiro, P.R.; Albuquerque, V.H.C. Computer-Assisted Parkinson’s Disease Diagnosis Using Fuzzy Optimum-Path Forest, and Restricted Boltzmann Machines. Comput. Biol. Med. 2021, 131, 104260. [Google Scholar] [CrossRef]
Andrade, E.C.; Pinheiro, P.R.; Filho, R.H.; Nunes, L.C.; Pinheiro, M.C.D.; Abreu, W.C.; Filho, M.S.; Pinheiro, L.I.C.C.; Pereira, M.L.D.; Pinheiro, P.G.C.D.; et al. Application of Machine Learning to Infer Symptoms and Risk Factors of COVID-19. In The International Research & Innovation Forum; Springer: Cham, Switzerland, 2022; pp. 13–24. [Google Scholar]
Pinheiro, P.R.; Tamanini, I.; Pinheiro, M.C.D.; Albuquerque, V.H.C. Evaluation of Alzheimer’s Disease Clinical Stages under the Optics of Hybrid Approaches in Verbal Decision Analysis. Telemat. Inform. 2018, 35, 776–789. [Google Scholar] [CrossRef]
Andrade, E.; Portela, S.; Pinheiro, P.R.; Comin, L.N.; Filho, M.S.; Costa, W.; Pinheiro, M.C.D. A Protocol for the Diagnosis of Autism Spectrum Disorder Structured in Machine Learning and Verbal Decision Analysis. Comput. Math. Methods Med. 2021, 2021, 1628959. [Google Scholar] [CrossRef]
Ruiz-Fernández, M.D.; Pérez-García, E.; Ortega-Galán, M. Quality of Life in Nursing Professionals: Burnout, Fatigue, and Compassion Satisfaction. Int. J. Environ. Res. Public Health 2020, 17, 1253. [Google Scholar] [CrossRef]
Shereen, M.A.; Khan, S.; Kazmi, A.; Bashir, N.; Siddique, R. COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses. J. Adv. Res. 2020, 24, 91–98. [Google Scholar] [CrossRef]
Heidari, A.; Navimipour, N.J.; Unal, M.; Toumaj, S. The COVID-19 epidemic analysis and diagnosis using deep learning: A systematic literature review and future directions. Comput. Biol. Med. 2021, 141, 105141. [Google Scholar] [CrossRef]
Matos, P.; Costa, A.; Silva, C. COVID-19, stock market and sectoral contagion in the U.S.: A time-frequency analysis. Res. Int. Bus. Financ. 2021, 57, 101400. [Google Scholar] [CrossRef]
Vizheh, M.; Qorbani, M.; Arzaghi, S.M.; Muhidin, S.; Javanmard, Z.; Esmaeili, M. The mental health of healthcare workers in the COVID-19 pandemic: A systematic review. J. Diabetes Metab. Disord. 2020, 19, 1967–1978. [Google Scholar] [CrossRef]
Jin, C.; Chen, W.; Cao, Y.; Xu, Z.; Tan, Z.; Zhang, X.; Deng, L.; Zheng, C.; Zhou, J.; Shi, H.; et al. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat. Commun. 2020, 11, 5088. [Google Scholar] [CrossRef] [PubMed]
Carvalho, D.; Pinheiro, P.R.; Pinheiro, M.C.D. A Hybrid Model to Support the Early Diagnosis of Breast Cancer. Procedia Comput. Sci. 2016, 91, 927–934. [Google Scholar] [CrossRef]
Lalmuanawma, S.; Hussain, J.; Chhakchhuak, L. Applications of machine learning and artificial intelligence for the covid-19 (SARS-COV-2) pandemic: A review. Chaos Solitons Fractals 2020, 139, 110059. [Google Scholar] [CrossRef]
Booth, A.L.; Abels, E.; McCaffrey, P. Development of a prognostic model for mortality in covid-19 infection using machine learning. Mod. Pathol. 2021, 34, 522–531. [Google Scholar] [CrossRef] [PubMed]
Liang, W.; Yao, J.; Chen, A.; Lv, Q.; Zanin, M.; Liu, J.; Wong, S.; Li, Y.; Lu, J.; Liang, H.; et al. Early triage of critically ill COVID-19 patients using deep learning. Nat. Commun. 2020, 11, 3543. [Google Scholar] [CrossRef] [PubMed]
Yang, P.; Xie, Y.; Rao, X.; Frix, A.N.; Moutschen, M.; Li, J.; Du, D.; Zhao, S.; Ding, Y.; Liu, B.; et al. Development of a clinical decision support system for severity risk prediction and triage of COVID-19 patients at hospital admission: An international multicenter study. Eur. Respir. J. 2020, 56, 2001104. [Google Scholar]
Gao, Y.; Cai, G.-Y.; Fang, W.; Li, H.-Y.; Wang, S.-Y.; Chen, L.; Yu, Y.; Liu, D.; Xu, S.; Cui, P.-F.; et al. Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nat. Commun. 2020, 11, 5033. [Google Scholar] [CrossRef]
Pinheiro, L.I.C.C.; Pereira, M.L.D.; Andrade, E.C.; Nunes, L.C.; Abreu, W.C.; Pinheiro, P.G.C.D.; Filho, R.H.; Pinheiro, P.R. An Intelligent Multicriteria Model for Diagnosing Dementia in People Infected with Human Immunodeficiency Virus. Appl. Sci. 2021, 11, 10457. [Google Scholar] [CrossRef]
Castro, A.K.A.; Pinheiro, P.R.; Pinheiro, M.C.D.; Tamanini, I. Towards the Applied Hybrid Model in Decision Making: A Neuropsychological Diagnosis of Alzheimer’s Disease Study Case. Int. J. Comput. Intell. Syst. 2011, 4, 89–99. [Google Scholar] [CrossRef]
Russell, J.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice-Hall: Hoboke, NJ, USA, 2010. [Google Scholar]
Shilo, S.; Rossman, H.; Segal, E. Axes of a revolution challenges and promises big data in healthcare. Nat. Med. 2020, 26, 29–38. [Google Scholar] [CrossRef]
Yu, K.H.; Beam, L.A.; Kohane, I.S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018, 2, 719–731. [Google Scholar] [CrossRef]
Alimadadi, A.; Aryal, S.; Manandhar, I.; Munroe, P.B.; Joe, B.; Cheng, X. Artificial intelligence and machine learning to fight COVID-19. Physiol. Genom. 2020, 52, 200–202. [Google Scholar] [CrossRef] [PubMed]
Mei, X.; Lee, H.C.; Diao, K.Y.; Huang, M.; Lin, B.; Liu, C.; Xie, Z.; Ma, Y.; Robson, P.M.; Chung, M.; et al. Artificial intelligence-enabled rapid diagnosis of patients with COVID-19. Nat. Med. 2020, 26, 1224–1228. [Google Scholar] [CrossRef] [PubMed]
Yan, L.; Zhang, H.-T.; Goncalves, J.; Xiao, Y.; Wang, M.; Guo, Y.; Sun, C.; Tang, X.; Jing, L.; Zhang, M.; et al. An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2020, 2, 283–288. [Google Scholar] [CrossRef]
Pan, P.; Li, Y.; Xiao, Y.; Han, B.; Su, L.; Su, M.; Li, Y.; Zhang, S.; Jiang, D.; Chen, X.; et al. Prognostic assessment of COVID-19 in the intensive care unit by machine learning methods: Model development and validation. J. Med. Internet Res. 2020, 22, e23128. [Google Scholar] [CrossRef]
Waheed, A.; Goyal, M.; Gupta, D.; Khanna, A.; Al-Turjman, F.; Pinheiro, P.R. CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved COVID-19 Detection. IEEE Access 2020, 8, 91916–91923. [Google Scholar] [CrossRef]
Bai, T.; Zhu, X.; Zhou, X.; Grathwohl, D.; Yang, P.; Zha, Y.; Jin, Y.; Chong, H.; Yu, Q.; Isberner, N.; et al. Reliable and Interpretable Mortality Prediction With Strong Foresight in COVID-19 Patients: An International Study From China and Germany. Front. Artif. Intell. 2021, 4, 672050. [Google Scholar] [CrossRef]
Heldt, F.S.; Vizcaychipi, M.P.; Peacock, S.; Cinelli, M.; McLachlan, L.; Andreotti, F.; Jovanović, S.; Dürichen, R.; Lipunova, N.; Fletcher, R.A.; et al. Early risk assessment for COVID-19 patients from emergency department data using machine learning. Sci. Rep. 2021, 11, 4200. [Google Scholar] [CrossRef]
Abdullah, D.; Susilo, S.; Ahmar, A.S.; Rusli, R.; Hidayat, R. The application of K-means clustering for province clustering in Indonesia of the risk of the COD-19 pandemic based on COVID-19 data. Qual. Quant. 2021, 56, 1283–1291. [Google Scholar] [CrossRef]
Ryan, L.; Lam, C.; Mataraso, S.; Allen, A.; Green-Saxen, A.; Pellegrini, E.; Hoffman, J.; Barton, C.; McCoy, A.; Das, R. A Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: A retrospective study. Ann. Med. Surg. 2020, 59, 207–216. [Google Scholar] [CrossRef]
Assaf, D.; Gutman, Y.; Neuman, Y.; Segal, G.; Amit, S.; Gefen-Halevi, S.; Shilo, N.; Epstein, A.; Mor-Cohen, R.; Biber, A.; et al. Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Intern. Emerg. Med. 2020, 15, 1435–1443. [Google Scholar] [CrossRef] [PubMed]
Fernandes, F.T.; de Oliveira, T.A.; Teixeira, C.E.; Batista, A.F.D.M.; Costa, G.D.; Filho, A.D.P.C. A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil. Sci. Rep. 2021, 11, 3343. [Google Scholar] [CrossRef] [PubMed]
Numpy. Available online: https://numpy.org/ (accessed on 5 January 2022).
Pandas. Available online: https://pandas.pydata.org/ (accessed on 15 January 2022).
Seaborn. Available online: https://seaborn.pydata.org/ (accessed on 12 January 2022).
Scikit-Learn. Available online: https://scikit-learn.org/stable/ (accessed on 7 January 2022).
Python. Available online: https://www.python.org/ (accessed on 11 January 2022).
Korbut, D. Machine Learning Algorithms: Which to Choose for Your Problem. Available online: https://blog.statsbot.co/machine-learning-algorithms-183cc73197c (accessed on 7 January 2022).
Presesti, E.; Gosmaro, F. Trueness, Precision and Accuracy: A Critical Overview of the Concepts and Proposals for revision. Accredit. Qual. Assur. 2015, 20, 33–40. [Google Scholar] [CrossRef]
Powers, D.M. Evaluation: From precision, recall, and f-measure to roc, informedness, markedness, and correlation. ArXiv 2020, arXiv:2010.16061. [Google Scholar]
Fawcett, T. Roc graphs: Notes and practical considerations for researchers. Mach. Learn. 2004, 31, 1–38. [Google Scholar]
Zrigat, E.; Altamimi, A.; Azzeh, M. A Comparative Study for Predicting Heart Diseases Using Data Mining Classification Methods. Int. J. Comput. Sci. Inf. Secur. 2016, 14, 868–879. [Google Scholar]
Shayanfar, H.; Gharehchopogh, F.S. Farmland fertility: A new metaheuristic algorithm for solving continuous optimization problems. Appl. Soft Comput. 2018, 71, 728–746. [Google Scholar] [CrossRef]
Abdollahzadeh, B.; Gharehchopogh, F.S.; Mirjalili, S. African vultures optimization algorithm: A new nature-inspired metaheuristic algorithm for global optimization problems. Comput. Ind. Eng. 2021, 158, 107408. [Google Scholar] [CrossRef]
Abdollahzadeh, B.; Gharehchopogh, F.S.; Mirjalili, S. Artificial gorilla troops optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems. Int. J. Intell. Syst. 2021, 36, 5887–5958. [Google Scholar] [CrossRef]

Figure 1. Measures for data classification.

Figure 2. Get knowledge cycle.

Figure 3. Correlation.

Figure 4. K optimized.

Figure 5. ROC curve—Hamming.

Figure 6. ROC curve—multinomial.

Figure 7. ROC curve—Decision Tree—entropy.

Figure 8. ROC curve—SVM.

Table 1. Data dictionary.

Individual Record Form—Hospitalised Severe Acute Respiratory Syndrome Cases
Field Name	Type	Allowed Values	Description
FEVER	Varchar2 (1)	1—Yes 2—No 0—Ignored 9—Ignored	Did the patient have a fever?
COUGH	Varchar2 (1)		Did the patient cough?
DYSPNEA	Varchar2 (1)		Did the patient have dyspnea?
THROAT	Varchar2 (1)		Did the patient have a sore throat?
PAIN_ABD	Varchar2 (1)		Did the patient have abdominal pain?
FATIGUE	Varchar2 (1)		Did the patient experience fatigue?
DIARRHEA	Varchar2 (1)		Did the patient have diarrhoea?
SATURATION	Varchar2 (1)		Did the patient have O₂ saturation <95%?
VOMIT	Varchar2 (1)		Did the patient experience vomiting?
PERD_OLFT	Varchar2 (1)		Did the patient experience a loss of smell?
LOST_PALA	Varchar2 (1)		Did the patient experience taste loss?
RISC_FACTOR	Varchar2 (1)		Does the patient have risk factors?
OBESITY	Varchar2 (1)		Does the patient have obesity?
VACCINE	Varchar2 (1)		Was the patient vaccinated against influenza in the last campaign?
SUPPORT_VEN	Varchar2 (1)	1—Yes, invasive 2—Yes, non-invasive 3—N 0—Ignored 9—Ignored	Did the patient use ventilatory support?
EVOLUTION	Varchar2 (1)	1—Cure 2—Death	Evolution of the case

Table 2. Positive correlation.

Attribute 01	Attribute 02	Correlation Value
THROAT	VOMIT	0.81
PAIN_ABD	PERD_OLFT	0.85
FATIGUE	PAIN_ABD	0.86
DIARRHEA	PAIN_ABD	0.82
VOMIT	DIARRHEA	0.9
PERD_OLFT	LOST_PALA	0.96

Table 3. Negative correlation.

Attribute 01	Attribute 02	Correlation Value
EVOLUTION	SUPPORT_VEN	−0.19

Table 5. Benchmark for K-Nearest Neighbor—KNN.

K-Nearest Neighbor—KNN
Metrics/Distance KNN	Accuracy	Precision	Recall That	F1 Score	Accuracy	Precision	Recall That	F1 Score	Accuracy	Precision	Recall That	F1 Score
Neighbor K	5				25				45
Euclidean	71.17%	74.31%	0.8518	0.79380	73.47%	74.25%	0.9073	0.81671	74.45%	75.04%	9107	0.82283
Manhattan	71.17%	74.32%	0.8516	0.79375	74.07%	74.62%	0.9122	0.82090	74.87%	75.09%	0.9191	0.82654
Hamming	71.87%	74.61%	0.8613	0.79957	74.65%	74.71%	0.9235	0.82599	75.27%	74.93%	0.9322	0.83082

Table 6. Benchmark for Naive Bayes.

Naive Bayes
Metrics/Distribution	Accuracy	Precision	Recall That	F1 Score
Gaussian	65.13%	66.42%	0.9401	0.77843
Bernoulli	59.37%	66.93%	0.7439	0.70462
Multinomial	66.62%	66.98%	0.9618	0.78966

Table 7. Benchmark for Decision Tree.

Decision Tree
Metrics/Criteria	Accuracy	Precision	Recall That	F1 Score
Gini index	71.58%	74.61%	0.8546	0.79670
Entropy	71.83%	74.87%	0.8544	0.79808

Table 8. Benchmark for Multilayer Perceptron (MLP).

Multilayer Perceptron—MLP
Metrics/Learning Rate and Momentum	Accuracy	Precision	Recall That	F1 Score
learning_rate = constantmomentum = 0.1	69.86%	70.48%	0.70633	0.70556
learning_rate = invscalingmomentum = 0.9	61.1%	65.23%	0.64333	0.64781
learning_rate = adaptive momentum = 0.9	76.3%	76.41%	0.76466	0.76441

Table 9. Benchmark for Support Vector Machine (SVM).

Support Vector Machine—SVM
Metrics/Kernel	Accuracy	Precision	Recall That	F1 Score	Accuracy	Precision	Recall That	F1 Score	Accuracy	Precision	Recall That	F1 Score
Cost	1				2				3
kernel = linearGamma = scale	66.25%	66.25%	1.0	0.79699	66.25%	66.25%	1.0	0.79699	66.25%	66.25%	1.0	0.79699
Kernel = LinearGamma = auto	66.25%	66.25%	1.0	0.79699	66.25%	66.25%	1.0	0.79699	66.25%	66.25%	1.0	0.79699
Kernel = RBFGamma = scale	72.08%	73.35%	0.9086	0.81173	72.80%	73.83%	0.91320	0.81646	73.28%	74.27%	0.91283	0.81902
Kernel = RBFGamma = auto	74.92%	75.53%	0.91924	0.82927	75.58%	76.43%	0.91283	0.83198	75.78%	76.61%	0.91320	0.83318
Kernel = POLYGamma = scale	66.25%	66.25%	1.0	0.79699	66.25%	66.25%	1.0	0.79699	-	-	-	-
Kernel = POLYGamma = scale	66.25%	66.46%	0.99018	0.79539	66.27%	66.48%	0.99018	0.79551	66.33%	66.51%	0.99056	0.79581
Kernel = POLYGamma = scale	66.95%	67.82%	0.95358	0.79265	-	-	-	-	-	-	-	-
Kernel = POLYGamma = auto	66.25%	66.25%	1.0	0.79699	-	-	-	-	-	-	-	-
Kernel = POLYGamma = auto	-		-	-	66.30%	66.51%	0.98981	0.79557	-	-	-	-

Table 10. Benchmark for prediction models.

Comparative Benchmark between Prediction Models
Metric/Prediction Model	Accuracy	Precision	Recall That	F1 Score	ROC	Time (s)
K-Nearest Neighbor—KNN	75.27%	74.93%	0.9322	0.83082	0.76202	69.995
Naive Bayes	66.62%	66.98%	0.9618	0.78966	0.64363	0.0826
Decision trees	71.83%	74.87%	0.8544	0.79808	0.69686	0.5455
Multilayer Perceptron—MLP	76.3%	76.41%	0.7646	0.76441	0.84300	286.023
Support Vector Machine—SVM	75.78%	76.61%	0.91320	0.83318	0.73772	0.00101

Table 11. Clinical questionnaire.

Individual Record Form—Hospitalised Severe Acute Respiratory Syndrome Cases
1. Did the patient have a fever?	9. Did the patient experience vomiting?
2. Did the patient have a cough?	10. Did the patient use ventilatory support?
3. Did the patient have dyspnea?	11. Did the patient experience a loss of smell?
4. Did the patient have a sore throat?	12. Did the patient experience a loss of taste?
5. Did the patient have abdominal pain?	13. Does the patient have any risk factors?
6. Did the patient experience fatigue?	14. Does the patient have obesity?
7. Did the patient have diarrhoea?	15. Was the patient vaccinated against influenza in the last campaign?
8. Did the patient have O₂ saturation <95%?

Table 12. Prediction of clinical evolution.

Questions/Patients	Patient 01	Patient 02	Patient 03	Patient 04	Patient 05
1. Did the patient have a fever?	1—Yes	2—No	1—Yes	1—Yes	1—Yes
2. Did the patient have a cough?	1—Yes	2—No	1—Yes	1—Yes	1—Yes
3. Did the patient have dyspnea?	1—Yes	1—Yes	1—Yes	2—No	2—No
4. Did the patient have a sore throat?	0—Ignored	2—No	2—No	2—No	2—No
5. Did the patient have abdominal pain?	0—Ignored	2—No	2—No	2—No	2—No
6. Did the patient experience fatigue?	1—Yes	2—No	1—Yes	1—Yes	1—Yes
7. Did the patient have diarrhoea?	1—Yes	2—No	1—Yes	2—No	2—No
8. Did the patient have O₂ saturation < 95%?	0—Ignored	2—No	1—Yes	2—No	2—No
9. Did the patient experience vomiting?	1—Yes	2—No	2—No	2—No	2—No
10. Did the patient use ventilatory support?	0—Ignored	1—Yes	1—Yes	1—Yes	2—No
11. Did the patient experience a loss of smell?	0—Ignored	2—No	1—Yes	1—Yes	1—Yes
12. Did the patient experience a loss of taste?	0—Ignored	2—No	1—Yes	1—Yes	1—Yes
13. Does the patient have any risk factors?	1—Yes	1—Yes	2—No	2—No	1—Yes
14. Does the patient have obesity?	1—Yes	2—No	2—No	1—Yes	2—No
15. Was the patient vaccinated against influenza in the last campaign?	0—Ignored	1—Yes	1—Yes	1—Yes	1—Yes
Clinical Case Evolution	73%—the case progressed to the cure (1) of the patient	84%—the case progressed to death (2) of the patient	92%—the case progressed to death (2) of the patient	86%—the case progressed to death (2) of the patient	85%—the case progressed to the cure (1) of the patient

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Andrade, E.C.d.; Pinheiro, P.R.; Barros, A.L.B.d.P.; Nunes, L.C.; Pinheiro, L.I.C.C.; Pinheiro, P.G.C.D.; Holanda Filho, R. Towards Machine Learning Algorithms in Predicting the Clinical Evolution of Patients Diagnosed with COVID-19. Appl. Sci. 2022, 12, 8939. https://doi.org/10.3390/app12188939

AMA Style

Andrade ECd, Pinheiro PR, Barros ALBdP, Nunes LC, Pinheiro LICC, Pinheiro PGCD, Holanda Filho R. Towards Machine Learning Algorithms in Predicting the Clinical Evolution of Patients Diagnosed with COVID-19. Applied Sciences. 2022; 12(18):8939. https://doi.org/10.3390/app12188939

Chicago/Turabian Style

Andrade, Evandro Carvalho de, Plácido Rogerio Pinheiro, Ana Luiza Bessa de Paula Barros, Luciano Comin Nunes, Luana Ibiapina C. C. Pinheiro, Pedro Gabriel Calíope Dantas Pinheiro, and Raimir Holanda Filho. 2022. "Towards Machine Learning Algorithms in Predicting the Clinical Evolution of Patients Diagnosed with COVID-19" Applied Sciences 12, no. 18: 8939. https://doi.org/10.3390/app12188939

APA Style

Andrade, E. C. d., Pinheiro, P. R., Barros, A. L. B. d. P., Nunes, L. C., Pinheiro, L. I. C. C., Pinheiro, P. G. C. D., & Holanda Filho, R. (2022). Towards Machine Learning Algorithms in Predicting the Clinical Evolution of Patients Diagnosed with COVID-19. Applied Sciences, 12(18), 8939. https://doi.org/10.3390/app12188939

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Machine Learning Algorithms in Predicting the Clinical Evolution of Patients Diagnosed with COVID-19

Abstract

1. Introduction

2. Background and Research

2.1. Highlights of COVID-19 Pandemic Concepts

2.2. Exposure of Healthcare Professionals to the COVID-19 Pandemic

2.3. Applying Artificial Intelligence to Pandemic Data

3. Methodology

3.1. Data Collection and Measurement

3.1.1. Data Collection

3.1.2. Data Dictionary

3.1.3. Data Measurement

3.2. Data Preprocessing

3.2.1. Definition of Input Data

3.2.2. Training and Test Data

3.3. Model Execution

3.4. Validation of Results

4. Results and Discussion

4.1. KNN Results

4.2. Naive Bayes Results

4.3. Results of the Decision Trees

4.4. Multilayer Perceptron Results

4.5. Results of the Support Vector Machine

4.6. Discussion

5. Experimental Evaluation

5.1. Definition of Values

5.2. Prediction of Clinical Evolution

6. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI