1. Introduction
With the advent of the Novel Coronavirus (SARS-CoV-2) in December 2019, first detected in the Wuhan Province of China, there was a major outbreak of the associated disease (COVID-19), which causes severe acute respiratory syndrome. More importantly, this virus can be transmitted directly from human to human, making it difficult to be contained. Rapidly, COVID-19 was observed in virtually all countries, triggering a severe public health crisis worldwide [
1,
2]. As a consequence, the World Health Organization (WHO) recognized this public health emergency as an ongoing pandemic on 11 March 2020 [
3]. Coronaviruses (CoV) belong to a large family of viruses that cause diseases related to colds like the Middle East Respiratory Syndrome (MERS-CoV) and the Severe Acute Respiratory Syndrome (SARS-CoV) [
4].
As of 30 August 2020, the number of Coronavirus cases in the world is approximately hitting the
million mark, with the total number of deaths surpassing 849,958 and an associated mortality rate of about 6% [
5]. Statistics show that about
of the COVID-19 cases have milder symptoms like fever, cough, and dyspnea. However, more serious cases can cause severe acute respiratory syndrome, pneumonia, and multi-organ failure [
4]. With the number of cases increasing daily, most countries find it challenging to keep up with the number of hospitalized patients, more so in Intensive Care Units (ICU). The ICUs are mostly occupied by patients suffering from COVID-19-related pneumonia [
4]. Ultimately, the development of a vaccine is necessary for the prevention and eradication of SARS-CoV-2. However, as the development of such vaccines is still a work in progress, early diagnosis, improved treatment of critical cases, and prevention of the spread through lockdowns are vital to reduce mortality rates [
6].
The gold standard for the diagnosis of COVID-19 patients is the Reverse Transcription-Polymerase Chain Reaction (RT-PCR) technique. However, there has been an inadequate number of testing kits for the SARS-CoV-2 during the disease’s early outbreak. The RT-PCR test also produces a high rate of false-negative results, due to sample preparation and quality control in particular [
7]. In addition, viruses such as influenza A and influenza B can cause symptoms similar to those of SARS-CoV-2, making it harder to differentiate between COVID-19 and non-COVID-19 cases, more so in the flu season [
8]. Uncertainty can lead to a broader spread of the disease if suspected people with the symptoms roam freely without being tested [
8]. Many overpopulated countries like India and Bangladesh have failed to conduct enough tests due to limited resources for guaranteeing widespread test kit availability [
9,
10]. Therefore, it is pertinent to develop an early diagnosis screening method considering cost-effectiveness and reliability, such that a larger population is impacted and can benefit from it.
Artificial intelligence (AI) is an emerging branch of computer science with demonstrated potential in a wide variety of fields, with applications ranging from decision tools in the energy and financial sectors [
11,
12] to medical imaging and diagnosis. With the unique capabilities of AI, safe, accurate, and efficient imaging solutions can be attained. In fact, AI has recently gained popularity as a useful tool for clinicians [
13,
14,
15,
16,
17,
18]. Over the years, similar to many other fields of research, deep learning (a sub-area of the machine learning field, inspired by the architecture of the brain [
19]) approach has shown an impressive performance in the field of medical image processing [
4]. By applying deep learning techniques, it is possible to draw meaningful results from medical data [
15,
20]. By benefiting from deep learning capabilities like image recognition and segmentation, detection and diagnosis of diseases like diabetes mellitus, brain tumors, skin cancer, and breast cancer have been both efficient and useful [
21,
22,
23,
24,
25,
26].
Recent studies show that AI-based applications can reduce dependency on the limited (RT-PCR) test kits [
27]. Even if the RT-PCR test shows negative results, symptoms can be identified by examining chest radiological imaging, namely, chest X-ray images [
28,
29]. X-ray machines are popular injury and disease diagnosis tools in most healthcare facilities and have been widely explored by care centers and hospitals during the extent of the current pandemic [
27,
30]. From a global perspective, X-ray exams are comparatively affordable in developing countries, with exam costs reaching as low as 5 USD [
31]. In developed countries, as an effect of a more costly healthcare infrastructure, X-ray exams may become more expensive, but are often covered by nearly
of public and private health insurance policies in countries like Australia, Canada, Germany, and Japan, and
in the USA [
32]. Individual charges and copays range from 0 to 50 USD in those countries [
33]. Regarding the common concern of exposure to ionizing radiation from X-rays, individual exams are known to be safe and expose the patient to significantly less ionization than, for instance, Computed Tomography (CT) exams [
27].
As a result, chest X-ray imaging recently draws attention to the researcher and practitioner for the early diagnosis of COVID-19 patients with pneumonia symptoms [
34]. For instance, Chen et al. (2020) used a deep-learning-based model for early detection of COVID-19-related pneumonia using image data from the Renmin Hospital patients at Wuhan University [
35]. Narin et al. (2020) describes the use of X-ray images for the coronavirus’ automatic detection by implementing a Deep Convolutional Neural Network, achieving an accuracy of around 98% using the ResNet50 model [
4]. Apart from this, Goshal and Tucker (2020) and Wang and Wong (2020) also developed a Convolutional Neural Network (CNN) to classify COVID-19 and Non-COVID-19 cases using X-ray images, with approximately 92.9% and 83.5% accuracy respectively [
36,
37].
Additionally, there are numerous other recent studies carried out with CT images using severaldeep learning models [
38,
39,
40,
41]. Likewise, machine learning (ML) algorithms using numerical/categorical data have also been utilized for the diagnosis of COVID-19. A number of studies [
34,
39,
42] developed machine learning models based on Lasso regression, and multivariate logistic regression for early identification of COVID-19 patients. Some of the more significant factors in these studies were age, temperature, heart rate, blood pressure, fever, sex, uric acid, triglyceride and serum potassium.
Even though the Center for Disease Control and Prevention (CDC) currently does not recommend it, still, many studies in this field of research use Chest radiography or CT scan images to diagnose COVID-19 [
43,
44]. For instance, a recent report in the journal of Applied Radiology (22 March 2020) [
44] claimed that using radiological images alone detects patients with ARDS (also known as acute respiratory distress syndrome [
45]) or SARS (also known as severe acute respiratory syndrome [
46]), as COVID-19, which is a drawback since the diseases are misclassified. Articles by Greenfieldboyce [
47] and Jewell [
48] suggested that a patient’s information such as age, gender, temperature, and chronic disease history are significant predictors to identify affected COVID-19 patients. Keeping this in mind, some of the studies in this field of research use (numerical or categorical) information such as age, gender, body temperature, and chronic disease history for diagnosis of COVID-19 as well. For instance, Bai et al. (2020), uses CT images (image data) and a combination of demographics, signs, and symptoms (numerical/categorical data) to establish an Artificial Intelligence (AI) model that predicts patients having mild symptoms with potential malignant progression [
49]. However, none of the previous studies considered numerical, categorical, and chest X-ray images in combination. Thus, developing a model comprising of numerical/categorical data coupled with chest X-ray images may create a new reliable alternative to screen patients with COVID-19 symptoms.
Taking these opportunities into account, our study focuses on mixed-data analysis using both image and numerical/categorical data to assist the early diagnosis of COVID-19 patients using a deep learning approach. A deep Multilayer Perceptron-Convolutional Neural network (Deep MLP-CNN) model is proposed, considering the age, gender, temperature, and chest X-ray images of patients. The model was tested under two conditions: a balanced dataset (containing 13 COVID-19 and 13 non-COVID-19 patients), henceforth referred to as Study One, and an imbalanced dataset (containing 112 COVID-19 and 30 non-COVID-19 patients), referred to as Study Two.
2. Dataset and Methodology
We adopted a COVID-19 data set containing both X-ray images and numerical/categorical data for each patient, collected from the open-source GitHub repository shared by Dr. Joseph Cohen [
50]. This database is continuously being updated with data shared by several entities around the world and has been used by many studies for detecting COVID-19 patients considering various data mining techniques. At the time of our study, the dataset contains data from 184 different patients with information such as age, gender, temperature, survival, intubation, partial pressure of oxygen dissolved in the blood (PO2) and classification as COVID-19, SARS, Pneumocystis, E. coli, Streptococcus, or “no findings” patients. For simplicity, we have organized the dataset in two groups: COVID-19 patients, and all others as non-COVID-19 patients (
Figure 1).
One of the challenges associated with this dataset was the missing data for select parameters across patients. In consideration of that limitation, for Study One (balanced dataset), a small dataset was set up with 13 COVID-19 and non-COVID-19 patients considering age, gender, temperature, and chest X-ray images as variables. Since there were numerous missing entries in the temperature column, only rows with complete information of the aforementioned variables were taken into account. No statistically significant difference (
p values were obtained using
t-tests (*) and chi-square tests (**)) was found between COVID-19 (6 female, 7 male) and non-COVID-19 (5 female and 8 male) groups in terms of sex distribution (
p * = 0.69 > 0.05), mean of age and temperature (
p ** = 0.49 > 0.05). Contrarily, the size of the dataset was enlarged by ignoring the “Temperature” column entirely for Study Two. In this case, an imbalanced dataset was constructed with information from 142 patients (112 COVID-19, 30 non-COVID-19) to compare and contrast the model’s performance with the imbalanced class. No statistically significant difference was observed between COVID-19 and non-COVID-19 groups in regards to the sex distribution (
p * = 0.34 > 0.05) and the mean of age (
p ** = 0.06 > 0.05).
Table 1 summarizes the datasets used for both studies.
The implementation of the MLP-CNN models and calculation of computational times took place using the Anaconda modules with Python 3.7, and ran on an office-grade laptop with common specifications (Windows 10, Intel Core I7-7500U, and 16 GB of RAM).
2.1. Proposed Model
Neural Networks (NN) recently showed more promising results over traditional machine learning (ML) algorithms like Linear regression, Logistic regression, and Random Forest, with high dimensional datasets, primarily when they contains combined numerical, categorical, and image data [
51]. Classical ML approaches may perform better with a small dataset as it is computationally inexpensive and easily interpretable. However, once the size of the data increases (big data), handling such big data becomes challenging for traditional ML approaches. Conversely, deep NN methods guarantee an opportunity to develop a more robust model that perform well on both small and large datasets, mainly due to recent advancements in different NN approaches such as transfer learning, Recurrent Neural Network (RNN), and CNN. Additionally, classical ML approaches often require sophisticated feature engineering or dimensionality reduction [
52]. In contrast, deep NN methods: provide better feature engineering methods, can be implemented directly, and achieve good results [
53].
We developed a deep learning-based model inspired from [
54]. Our choice for this architecture was motivated by its predictive performance on visual and textual features, addressed in many recent papers [
55,
56,
57,
58]. Our proposed model is a combination of a Multilayer Perceptron (MLP) and Convolutional Neural Network (CNN). On one hand, MLP was used to handle the numerical/categorical data; on the other, CNN was used to extract features from the X-ray images. Parameter tuning was performed to improve the model’s performance, mainly: the number of hidden layers, number of neurons, epochs, and the batch size. At first, hidden layers and the number of neurons were set randomly; however, the optimal parameters were later determined using the grid search method. The optimized parameters using the grid search method were as follows: Learning Rate = 0.001, Batch Size = 5, Epochs = 50. Finally, the proposed MLP model was combined with the CNN architecture, as suggested by [
54]. As shown in
Figure 2, the highest accuracy (100%) was achieved on 50 epochs, while the training loss was minimized up to 85%.
Table 2 shows how different numbers of neurons and hidden layers affect the MLP models. Based on our experiment, with two hidden layers and four neurons, it is possible to achieve 100% accuracy while reducing the loss up to 100%.
In order to obtain the best model, optimization algorithms needed to be applied during the training phase [
59]. For that purpose, we have tested three popular optimization algorithms: adaptive learning rate optimization algorithm (Adam) [
60], stochastic gradient descent (Sgd) [
61], and root mean square propagation (Rmsprop) [
62].
2.2. How Our Proposed MLP-CNN Model Works
We applied the Rectified Linear Unit (ReLU) as the activation of each neuron in the input and hidden layers and utilized the “linear” function in the final layer [
63]. The first input layer of the MLP consists of eight neurons and takes the numerical/categorical data as a one-dimensional array. The hidden layer consists of four neurons and the final layer consists of one neuron. Secondly, the proposed CNN model contains three convolution layers, along with three pooling layers (Max Pooling). The first hidden layer is a convolutional layer with 16 feature maps, each with a kernel size of 64 pixels and a “ReLu” activation function. Then, we have defined the pooling layer that takes the maximum value, configured with a pool size of (2,2). The following pooling layer is a dense layer that takes 16 neurons, succeeded by activation function-ReLU. The next layer is another dense layer with four neurons. Two individual outputs emerged from two separate models—one from the MLP model and the other from the CNN model. Both outputs are concatenated and considered as a single input. The newly acquired single input was counted as an initial input followed by additional two dense layers consisting of four neurons. The Keras functional API was utilized to concatenate the MLP and the CNN models, as it provides a potential opportunity to develop models with multiple inputs and outputs. Typically, such models merge inputs from different layers using an additional layer and combine several tensors, as shown in
Figure 3, which illustrates the overall diagram of our proposed end-to-end model. In summary, we have encoded the numerical/categorical inputs and the chest X-ray input as vector inputs, then concatenated these vectors. Finally, the output layer has one neuron for the two classes and a linear activation function to provide probability-like predictions for each class.
2.3. Experiment Setup
The performance of the model was evaluated using 5-fold cross-validation for both Studies (Study One and Study Two). The experiment was repeated five times (as shown in
Figure 4), and the overall performance of the model is computed by averaging the outcomes of all the 5 folds.
The results were presented in terms of accuracy, precision, recall, and F1 score with
confidence interval [
64].
where:
True Positive ()= COVID-19 patient classified as patient
False Positive ()= Healthy individuals classified as patient
True Negative ()= Healthy individuals classified as healthy
False Negative ()= COVID-19 patients classified as healthy
3. Computational Results
At first, as means of identifying appropriate training and testing set ratios for validation, we have split our data into the following training set/testing set ratios: 75:25, 70:30, 60:40, 85:15, and 80:20. Such split ratios are commonly used in deep learning techniques for model evaluation and validation [
65,
66,
67]. The best results in terms of training and testing accuracy were found when the dataset was split randomly into 80% and 20% for training and testing sets, respectively. To exemplify that,
Table 3 presents the performance of our proposed models with different ratios of randomly split data between training and testing. Since the dataset is comparatively small, reducing training data also reduces the model’s ability to achieve better performance in terms of accuracy. In contrast, increasing the training set with a small number of datapoints for testing is not sufficient to confidently measure the model’s overall performance.
The training stage was carried out up to no more than 50 epochs to avoid overfitting. A graphical illustration of the model’s overall performance using Adam and Rmsprop is presented for the 5th fold in
Figure 5.
Each model’s average performance on both balanced (Study One) and imbalanced (Study Two) datasets along with 95% confidence intervals are displayed in
Table 4. For the balanced dataset, Adam had the highest accuracy (96.3%), precision (97.2%), recall (96.3%), and F1 score (96.4%) compared to the other two models-trained with Rmsprop and Sgd. Rmsprop outperformed all other models on the imbalanced dataset. While considering the overall performance on both datasets (average of both studies) the model trained with Adam is the best in terms of accuracy (94.6% ± 3.4%), precision (93.5% ± 3.7%), recall (94.5% ± 3.5%), and F1 score (93.5% ± 3.7%).
Overall execution time for both datasets is shown in
Figure 6. The lowest registered execution time was 53 s for the model trained with Rmsprop in the balanced dataset, whereas the maximum execution time was 79 s when the model was trained with Sgd. Conversely, for the imbalanced dataset, Adam showed the lowest execution time of 138 s, while Rmsprop displayed the maximum execution time of 163 s. In conclusion, when both studies are considered holistically, the average execution time for Adam was lowest in comparison with the other two.
To evaluate the predictive performance of each model, confusion matrices were generated.
Figure 7 shows confusion matrices for the models trained with Adam, Rmsprop, and Sgd, respectively, for fold-5. In Study One, the test set contained 6 patients, where 4 were COVID-19, and 2 were non-COVID-19. In this case, both Adam and Sgd correctly classified all the samples, while Rmsprop misclassified 4 out of 6 samples. On the other hand, in Study Two, 29 samples were used for the test set (25 COVID-19 and 4 non-COVID-19). Here, Adam illustrated the best performance by correctly classifying 29 samples, while Rmsprop and Sgd show the worst performance by classifying 27 samples out of 29.
4. Discussion
In this study, we proposed and evaluated an MLP-CNN based model that can distinguish between patients with and without COVID-19, and demonstrated the advantage of combined MLP-CNN models over traditional CNN or MLP used exclusively for that purpose. Our combined model achieved an accuracy of around 96.3 (using Adam optimization algorithm) in comparison to few published studies that used only CNN [
36,
37,
39] or traditional ML [
41] approaches. On the one hand, MLP models are fast and time-efficient when used with numerical and/or categorical data only. On the other hand, CNN models are notably more accurate in extracting useful features from chest X-ray images for respiratory disease diagnosis. For instance, Wang and Wong [
37] and Khan et al. [
68] used CNN-based approaches to detect the onset of COVID-19 disease using chest X-ray images and achieved an accuracy of 83.5% and 89.6%, respectively. In comparison, as previously stated, our combined model demonstrated an accuracy of around 95.4%.
In Study One, our model learned from only 26 COVID-19 subjects, which represents 18% of the data used by Zhang et al. [
69] and 2% by Shi et al. [
41] (see
Table 5 and
Table 6). Therefore, our proposed model may be used as a useful computer-aided diagnosis tool for low-cost and fast COVID-19 screening considering small datasets.
Additionally, our model performed better with an imbalanced dataset compared to recent studies [
37,
38,
39,
40,
41] that also used imbalanced datasets (
Table 6). For instance, Jin et al. [
38] used 1882 CT scan images where the data ratio was 1:2.78 (497 COVID-19:1385 other) and achieved 94.1% accuracy. Similarly, the data ratios of a range of other recent studies, particularly Song et al. (2020) [
39], Butt et al. (2020) [
40], and Shi et al. (2020) [
41] were 1:2.11, 1:1.82, and 1:1.62 respectively; while their measured accuracies were 82.9%, 86.7% and 90.7%, accordingly. For the imbalanced dataset, we used 112 COVID-19 and 30 non-COVID-19 patients’ (ratio 1:3.73) chest X-ray images. Even with a higher ratio of imbalance in our dataset, it outperformed several recent similar studies [
37,
38,
39,
40,
41] by achieving a higher accuracy of 95.4%. It should be noted that all studies mentioned in
Table 6 used only image data in their experiments, while we considered a mixed-data approach, using both numerical/categorical and image data.
In summary, we have proposed an MLP-CNN based model that can determine between COVID-19 and non-COVID-19 patients using information like age, gender, temperature, and chest X-ray images. Both balanced and imbalanced data were considered for the experiments, achieving an average accuracy of around 95% (96.3% from Study One and 95.4% from Study Two).
Finally, our model can be easily adopted by healthcare professionals as it is cost and time-effective, which accelerates COVID-19 screening procedures and enables patients with the disease to be isolated at earlier stages. Real-time screening of COVID-19 patients using MLP-CNN approaches might be possible with minimal human interaction, provided that chest X-ray images and other relevant information such as age, gender, and temperature of the respective patients are available. Additionally, AI-based screenings can be tailored to a low degree of complexity to the end user, and may not require the training of technicians in the complex computational tools herein described. We identify the following limitations of our study, which present immediate opportunities for future investigations:
the size of the dataset adopted is comparatively small, and
only four numerical and categorical parameters were considered.