Lightweight Neural Network for COVID-19 Detection from Chest X-ray Images Implemented on an Embedded System †

: At the end of 2019, a severe public health threat named coronavirus disease (COVID-19) spread rapidly worldwide. After two years, this coronavirus still spreads at a fast rate. Due to its rapid spread, the immediate and rapid diagnosis of COVID-19 is of utmost importance. In the global ﬁght against this virus, chest X-rays are essential in evaluating infected patients. Thus, various technologies that enable rapid detection of COVID-19 can offer high detection accuracy to health professionals to make the right decisions. The latest emerging deep-learning (DL) technology enhances the power of medical imaging tools by providing high-performance classiﬁers in X-ray detection, and thus various researchers are trying to use it with limited success. Here, we propose a robust, lightweight network where excellent classiﬁcation results can diagnose COVID-19 by evaluating chest X-rays. The experimental results showed that the modiﬁed architecture of the model we propose achieved very high classiﬁcation performance in terms of accuracy, precision, recall, and f1-score for four classes (COVID-19, normal, viral pneumonia and lung opacity) of 21.165 chest X-ray images, and at the same time meeting real-time constraints, in a low-power embedded system. Finally, our work is the ﬁrst to propose such an optimized model for a low-power embedded system with increased detection accuracy.


Introduction
COVID-19 is a highly contagious and infectious disease that causes severe infection of the lower respiratory system. COVID-19 was officially declared a global pandemic by the World Health Organization (WHO) on 11 March 2020 [1]. The total number of COVID-19 cases worldwide up to this point, which changes almost every minute, is over 328 million, including 5 million deaths and 55 million active patients [2]. The increasing number of deaths from COVID-19 and the rapid rate of disease transmission makes it one of the world's most significant and most serious public health issues.
The clinical findings of COVID-19 infection are those of bronchopneumonia, where cough, dyspnea, fever and respiratory failure with acute respiratory distress syndrome (ARDS) occur [3,4]. Radiographic imaging is the fastest, cheapest, and most essential diagnostic tool for detecting pneumonia while providing lower patient radiation than computed tomography (CT) and magnetic resonance imaging (MRI) [5,6].
In COVID-19 disease, the lungs are affected resulting in severe viral pneumonia [7][8][9]. Deaths from viral pneumonia due to COVID-19 are constantly increasing day by day. Thus, a valid diagnosis of COVID-19 pneumonia is vital but often a difficult task due to the similar symptoms and radiological findings that occur between patients infected with COVID- 19 and patients with viral pneumonia [10][11][12]. This difficulty can be alleviated by enhancing the arsenal of health workers using artificial intelligence in the form of machine learning.
In recent years, artificial intelligence (AI) has been established as a powerful tool in the field of medical services for classification, prediction, diagnosis, detection and segmentation [13][14][15][16][17][18]. In medical imaging, DL in chest X-rays has been shown to play an essential role in the detection of viral pneumonia, bacterial pneumonia and other chest diseases [19,20]. With the rapid spread of the virus and the fight against pandemics, researchers are using state-of-the-art DL techniques to classify COVID-19 X-rays [21][22][23][24]. Therefore, the design and development of DL models for the classification of COVID-19 infected radiographs is an urgent need to address the pandemic to make appropriate clinical decisions and significantly reduce workload [25].
The main goal of this study is to propose a robust, lightweight neural network, enhancing the work in [26], that provides high accuracy of chest X-ray prediction for COVID-19 detection executed on a low-power embedded system. The classic deep CNNs such as VGG16, VGG19 [27] have excellent performance, but they are difficult to deploy on devices with low hardware, due to the amount of model parameters and computational complexity. We use low-power embedded systems to design a low-cost portable point-of-care medical device that would be affordable for any hospital to acquire [28,29].
Many models have been developed and modified, but there is still room for improvement, as our research proves.
The main contributions of the proposed study can be summarized as follows: • We propose a modified neural network structure of the MobileNetV2 model, to maximize the learning ability for classification of chest X-rays. The modified version of our architecture requires significant less training time than other existing DL architectures due to the small number of network parameters. • Our design can classify four different categories of chest X-rays (COVID-19, normal, viral pneumonia and lung opacity). The accuracy of our approach is significantly higher than standard architecture and surpasses other state-of-the-art methods. • This is the first study to investigate a large set of chest X-ray images (21.165 chest X-ray images) combined from many other studies that appear in the literature, which include few X-ray samples and mainly concern binary classifications. • The modified structure of the architecture yields excellent classification results and, in combination with the small size of the network, leads to an attractive model for diagnosing chest X-rays in embedded systems.
The rest of the paper is organized as follows. In Section 2, we summarize related work. Subsequently, in Section 3 we give an outline of the procedure followed for the proposed approach, while in Section 4 we present the experimental results, and in Section 5 we conclude the article.

Related Work
In the global fight against the virus, the rapid and valid assessment of COVID-19 infected patients is a major challenge for the research community. Several studies have been performed to demonstrate the potential of DL in detecting COVID-19 infected chest X-rays in the field of medical imaging. Kamal [35] the authors proposed a model called DarkCovidNet. In their tests, the dataset they used consisted of 127 COVID-19, 500 normal and 500 pneumonia. Their DL model was trained for 100 epochs and for 3 class classification achieved an accuracy of 87.02%. Furthermore, in [36] compared five pre-trained CNN models (MobileNetV2, VGG19, Xception, InceptionV3 and InceptionResNetV2). They used two sets of data, the first set consisting of 224 COVID-19, 504 normal, 700 bacterial pneumonia, while the second set consisted of 224 COVID-19, 504 normal, 714 bacterial and viral pneumonia. Each CNN model was trained with the same hyperparameters for 10 epochs. The DL MobileNetV2 architecture in the first set for 3 class classification achieved 92.85% accuracy and in the second dataset achieved 94.72% accuracy. Misra et al. [37] proposed a multi-channel pre-trained ResNet architecture. The dataset they used consisted of 1.579 normal, 4.245 pneumonia and 184 COVID-19 images. Their DL model was trained up to 500 epochs with early stopping criteria. Their model for 3 class classification achieved 93.90% accuracy.
In [38] the authors evaluated the VGG16, VGG19, DenseNet201, InceptionResNetV2, InceptionV3, Resnet50 and MobileNetV2 models. The dataset they used in their tests consisted of 2.780 images of bacterial pneumonia, 231 of COVID-19 and 1.583 normal. Each CNN model was trained with the same hyperparameters for 300 epochs. The InceptionRes-NetV2 model for 3 class classification achieved 92.18% accuracy, while the MobileNetV2 85.47%. Narinet al. [39] compared five pre-trained CNN models (ResNet50, ResNet101, ResNet152, InceptionV3 and InceptionResNetV2). In their study they applied three different binary classifications with four classes (COVID-19, viral pneumonia, bacterial pneumonia, normal). The used dataset in their tests consisted of 341 COVID-19, 2.800 normal, 2.772 bacterial pneumonia and 1.493 viral pneumonia chest radiography images. Each DL model trained 30 epochs. The results showed that the pre-trained ResNet50 model provided the highest classification performance in all three different binary classifications with an accuracy of over 96%. The ResNet50 model they propose consists of over 25 million parameters.
Most of the above studies used a limited dataset, which varies and includes minimal samples of COVID-19 X-rays ranging from 100 samples up to 500 using millions of parameters. Studies have shown an urgent need for efficient DL models in large-scale data with fewer parameters that will provide higher accuracy. This work presents a modified version of the MobileNetV2 model for detecting COVID-19 infection from chest X-ray images and focuses on 4 class classifications.
In contrast with all other authors, we propose a well-tuned DL architecture that can be executed on a low-power embedded system in real time (sustainable performance with over 120 fps). Furthermore, we have trained our model using a dataset of over 21 k images. In our study, our goal is for the model to combine small network size a small number of parameters and to show very high performance for the classification of the four most important cases for lung deceases (COVID-19, normal, lung opacity and viral pneumonia) in a large dataset 21.165 chest X-ray images.

Materials and Methods
In this section, we will describe the dataset. Then, we will describe the details of the proposed model architecture, the training on the embedded GPU and the performance metrics.

Dataset Description
In this study, the dataset we used in our experiments is the COVID-19 Radiography Database [40]. This dataset is currently one of the largest public databases with 21.165 chest X-ray images and includes 3.616 COVID-19 positive cases, 10.192 normal, 6.012 lung opacity (Non-COVID lung infection) and 1.345 viral pneumonia images. All the images are in PNG file format, and the resolution is 299 × 299 pixels. Figure 1 shows a sample of the COVID-19 Radiography Database (normal, COVID-19, lung opacity and viral pneumonia).

Splitting the Dataset
We randomly split the dataset into training 70% (14.813 images), validation 20% (4.232 images), and testing 10% (2.120 images). The training set consists of 14.813 images, in which there were 2.530 COVID-19 images, 7.134 normal images, 4.208 lung opacity images and 941 viral pneumonia images. The validation set consists of 4.232 images, in which there were 723 COVID-19 images, 2.038 normal images, 1.202 lung opacity images and 269 viral pneumonia images. The test set consists of 2.120 images, in which there were 363 COVID-19 images, 1.020 normal images, 602 lung opacity images and 135 viral pneumonia images. The images used for testing were never used in the training process. The summary of the dataset (training, validation, and test) after the split is presented in Table 1.

Data Pre-Processing
Most CNNs are trained on image resolution with 224 × 224 pixels [41]. Therefore, in all the images of the COVID-19 Radiography Database [40] we changed the size from 299 × 299 pixels to 224 × 224 pixels. The size change was performed using the Python Imaging Library (PIL). The data augmentation methods such as random rotation, width shift, height shift, horizontal and vertical flip operations were not used because they have many limitations on medical images due to their strict format [42].

Proposed Modified Model Architecture
Most state-of-the-art CNNs have many millions of parameters to improve accuracy, and thus they are very time consuming during the training process. The proposed model is based on the MobileNetV2 architecture [41]. MobileNetV2 is one of the most popular lightweight network architectures for vision applications (classification, object detection and semantic segmentation). It is a very efficient model that improves the state-of-the-art performance on many visual recognition tasks. The MobileNetV2 is based on an inverted residual and linear bottleneck, significantly decreasing the number of operations and memory needed while retaining high accuracy. Although the classification accuracy of MobileNetV2 is 71% top-1 accuracy such as that of deep CNNs such as VGG16 and VGG19 (up 138 million parameters and 520 MB size) [27], it has the unique advantages of a smaller size network, fewer parameters, fewer operations, high efficiency, low-power and low latency.
The weights of the network are initialized with weights from a model pre-trained on ImageNet [43]. The top layer of the MobileNetV2 is discarded. To enhance the learning ability of the model, we added three additional new layers. In contrast with all other authors, we present an enhanced MobileNetV2 with more layers, improving the learning ability. The number and types of layers were determined after enough research into various networks and performing many different executions.
The first layer that we added is a global average pooling to minimize the overfitting and the number of parameters in the model by applying corresponding mathematical computation. Global average pooling generates one feature map for each corresponding category of the classification task in the last fully connected layer. The advantage of global average pooling is more native to the convolution structure between feature maps and categories [44].
After that, a fully connected layer with 512 nodes is added as a second layer with Rectified Linear Unit (ReLU) activation function [45]. The fully connected layer functions similar to a multilayer perceptron. Mathematical computation of ReLU activation function is shown in Equation (1).
A dropout layer is added, as the third layer, with a dropout rate of 20% to prevent network overfitting and divergence [46][47][48]. Dropout is a technique that randomly sets a portion of units, along with their connections, from the fully connected layers to zero during training. Dropout improves the performance of neural networks on supervised learning tasks in computer vision [48].
Finally, a fully connected layer is added as an output layer with 4 nodes, each one for every category, and SoftMax [49] is used as the activation function. This layer is used to predict output images. Mathematical computation of SoftMax activation function is shown in Equation (2).
where x i represent input data and m the number of classes. The details of the architecture, parameters and output shape of the proposed model are presented in Table 2. It is important to notice that the global average pooling and the dropout layer consist of 0 parameters, while the fully connected layer with 512 nodes required our model to carry only 655.872 parameters but gave it higher accuracy than the standard architecture (omitting our 3 addition layers). Our model has a total of 2.915.908 parameters and is much lighter than many other studies that appear in the literature, which include more than 25 million parameters. The overall proposed methodology of the model is shown in Figure 2.

Training Strategy
We train our model using a two-step strategy. In the first step, we freeze the internal layers of the network and train only the three new layers (global average pooling layer, fully connected ReLU layer and dropout layer) for 10 epochs using an Adam optimizer [50] (β1 = 0.9 and β2 = 0.999) with a learning rate of 0.001. In this step, the learning is enhanced in the features of our dataset.
In the second step, we are reducing the learning rate of 0.0001 for 20 epochs and training throughout the network to inform all the weights in the network. The batch size of 32 remained constant throughout the training. In both phases of training, the activation function at intermediate layers is ReLU, and at the output layer is SoftMax. Loss function is categorical cross-entropy, and the dropout value is 0.20. The hyperparameters used are presented in Table 3.

Training the Model on the Embedded GPU
Training and testing are performed on the Nvidia Jetson AGX Xavier [51] with 512 CUDA cores, an 8 core ARM processor, 32 GB RAM, 32 GB eMMC and various other peripherals. Nvidia Jetson CPU-GPU heterogeneous architecture achieves high-performance computing and where can be quickly programmed to accelerate complex DL tasks [52]. The embedded GPU has shown great acceleration potential and involves low-power, high accuracy and efficiency in point-of-care medical applications [28,29,53,54]. At the same time, they provide local processing, eliminating security and privacy issues where required in biomedical systems [55]. Finally, we need a portable tool to facilitate medical diagnosis with low-weight, low-cost devices with accuracy, speed and power efficiency. The CNN models have trained on the Keras [56] framework.
The system can process up to 115 images per second suitable for real-time processing with approximately 28.5 Watts of maximum power consumption, which is much lower than the typical 1 kWatt for a PC with a GPU that executes very large DL models and cannot reach our highly optimized accuracy.

Model Evaluation Metrics on the Test Dataset
Four metrics (accuracy, precision, recall and f1-score) [57] were used to evaluate the performance of the standard MobileNetV2 and the modified MobileNetV2 architectures. The calculation types of the metrics are shown in Equations (3)

Experimental Results
All executions were performed on the Jetson AGX Xavier embedded system. The total training times of standard MobileNetV2 were 1 h 38 min 22 s, and for the modified MobileNetV2, it was 1 h 40 min 52 s. Table 4 shows the performance comparison of the standard MobileNetV2 and the modified MobileNetV2. The total accuracy is 95.80% in the modified MobileNetV2, and the standard MobileNetV2 is 90.47%. The support is the number of occurrences of each class in the testing dataset, as shown in the classification report.
In Figures 3 and 4 we show the training accuracy/loss and validation accuracy/loss for standard MobileNetV2 and modified MobileNetV2, respectively. Modified MobileNetV2 architecture achieved 95.80% testing accuracy, while MobileNetV2 standard 90.47% for 4 classes of the public COVID-19 Radiography Database. It is clear that the performance of our modified MobileNetV2 model is much better than the standard MobileNetV2. Additionally, the modified MobileNetV2 model is more stable, while the standard MobileNetV2 shows more oscillation. The confusion matrix of the standard MobileNetV2 and proposed method is shown in Figures 5 and 6. It is evident that our modified MobileNetV2 achieves much better results, classifying correctly many more images than the standard MobileNetV2.

Comparison with Other Approaches
Comparing the performance of the proposed architecture with other existing methods is not possible because the dataset is different. Most studies are mainly concerned with binary classification, while there is a relatively limited number of studies concerning multiple classes.
However, in Table 5, we summarize the overall accuracy and architecture performance compared to our modified model from other existing models. Specifically, we only look at architectures used to classify 3 or 4 classes, but most have a small number of COVID-19 samples. It is evident that our model has much higher accuracy than every other research, backed up by the biggest dataset of X-rays examined.
As shown in Table 5, the proposed dataset presents the largest number of COVID-19 cases. We introduce a well-tuned DL technique that can be executed on a low-power embedded system in real time with promising results. We propose three additional new layers (global average pooling, fully connected layer with 512 nodes and dropout) and achieve high performance, small network size and a small number of parameters. We achieve the highest accuracy compared to the literature review with an accuracy of 95.80% for multiclass classification.

Conclusions and Future Work
COVID-19 has spread rapidly worldwide and is a severe public health problem. Thus, an early and valid prognosis of COVID-19 infected patients is vital to preventing the spread of the disease. This study proposes a lightweight neural network that provides high prediction accuracy from chest X-rays. The results of this study seem promising, but they must be further improved as more and more COVID-19 chest X-ray data become available. In fact, the model benefits of the numerosity of the examined X-ray dataset in reaching much higher accuracy than every other compared research. Our architecture approach achieves 95.80% accuracy, the highest up to this date compared with similar research, and processes up to 115 images per second with approximately 28.5 Watts of maximum power consumption on the Nvidia Jetson AGX Xavier. Furthermore, our work is the first to propose an optimized model for a low-power embedded system with increased detection accuracy. In a later study, we will test the performance of the proposed approach on different modern CNN models with an even larger number of COVID-19 and the number of classes chest X-rays in the dataset.

Informed Consent Statement: Not applicable
Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://www.kaggle.com/tawsifurrahman/covid19-radiography-database/activity (accessed on 25 May 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: