Deep Malaria Parasite Detection in Thin Blood Smear Microscopic Images

: Malaria is a disease activated by a type of microscopic parasite transmitted from infected female mosquito bites to humans. Malaria is a fatal disease that is endemic in many regions of the world. Quick diagnosis of this disease will be very valuable for patients, as traditional methods require tedious work for its detection. Recently, some automated methods have been proposed that exploit hand-crafted feature extraction techniques however, their accuracies are not reliable. Deep learning approaches modernize the world with their superior performance. Convolutional Neural Networks (CNN) are vastly scalable for image classiﬁcation tasks that extract features through hidden layers of the model without any handcrafting. The detection of malaria-infected red blood cells from segmented microscopic blood images using convolutional neural networks can assist in quick diagnosis, and this will be useful for regions with fewer healthcare experts. The contributions of this paper are two-fold. First, we evaluate the performance of different existing deep learning models for efﬁcient malaria detection. Second, we propose a customized CNN model that outperforms all observed deep learning models. It exploits the bilateral ﬁltering and image augmentation techniques for highlighting features of red blood cells before training the model. Due to image augmentation techniques, the customized CNN model is generalized and avoids over-ﬁtting. All experimental evaluations are performed on the benchmark NIH Malaria Dataset, and the results reveal that the proposed algorithm is 96.82% accurate in detecting malaria from the microscopic blood smears.


Introduction
Malaria is a disease activated by a type of microscopic parasite transmitted from infected female mosquito bites to humans. The malaria parasite ruptures the red blood cells in human blood and replicates these parasites to other cells. Malaria patients usually feel very sick, with high fever, headache, muscle pain, and fatigue. According to the World Health Organization (WHO), malaria parasites caused the death of 438,000 people in 2015 and 620,000 people in 2017, and infection cases are around 300-500 million annually [1]. Light microscopy is a standard method for identifying malaria disease and all its species of parasite by screening films of red blood cells. Other methods, like rapid diagnostic tests [2], are also used for a prompt parasite-based diagnosis. It is a widely used test with a false positive rate of less than 10%. It is helpful in initial diagnosis; however, its performance is affected by the quality of the product and parasite-related factors [3]. For examining malaria infection in light microscopy, the glass slide is prepared by applying a drop of blood, which is merged with the Giemsa staining solution to enhance the visibility of parasites in red blood cells under a microscope. Furthermore, thick and thin smears of blood are used for malaria diagnosis. A thick smear typically identifies the existence of the parasite in blood while a thin smear detects species of malaria and parasite stages. An expert microscopist usually takes 20 to 30 min for a careful examination of a single blood film to count the number of infectious cells by inspecting the variations in the shape, color, and size characteristics of red blood cells.
Millions of blood smear films are manually examined by expert pathologists every year and it takes a massive human and economic effort to diagnose malaria. Moreover, the parasite counts from blood films should be accurate for a correct diagnosis and classification of disease severity. For example, if malaria cells were not present in a patient but the doctor erroneously prescribed antibiotics, it would unnecessarily cause abdominal pain or nausea to the suspected patient [4]. Diagnosis of malaria should be robust with high sensitivity (less false negatives) towards capturing parasites at all stages of the malaria life-cycle. Correct diagnosis at earlier stages can be helpful for the treatment of malaria in endemic regions where expert pathologists are fewer, and the workload of screening blood films is massive. Automatic malaria detection methods can serve many patients with fast, cost-effective, and accurate diagnoses.
Traditional methods of automating the malaria detection process involve complex image-processing techniques with hand-engineered features e.g., shape, color, intensity, size, and texture [5][6][7]. In these methods, the red blood cells are detected from microscopic images by using different segmentation techniques. After the selection of appropriate features for red blood cells, a computed set of features is used in the classification of segmented images into infected and uninfected classes. For example, morphologicalbased approaches are used to segment cell images with structuring elements to enhance the characteristics of red blood cells, such as the roundness of cells, which improves the classification accuracy. In the literature, various methods are adopted for the segmentation, feature extraction, and classification of malaria diagnosis [8]. After analyzing conventional and recent malaria detection methods, it is observed that there is a tradeoff between accuracy and the computational complexity of models, that is, when the accuracy of a model increases, its computational complexity also increase [9]. For example, the computation time of classification by support vector machine (SVM) is faster than a deep neural network, but the accuracy of a deep neural network is found to be higher than SVM.
In recent years, deep learning (DL) techniques are exploited for automated malaria diagnosis with appreciable detection rates. Deep learning models eliminate the computation of hand-crafted features, as the hidden layers of deep models extract features automatically by analyzing the data. Deep learning models require large datasets for training neural networks and to improve the accuracy of the model. However, in medical applications like malaria diagnosis, relatively small datasets are available. This is because building an annotated dataset requires input from pathologists that is not readily available. To overcome the paucity of the dataset, recently introduced image augmentation techniques in deep learning models provide better generalization and reduce the over-fitting. Image augmentation increases the dataset by taking the original image and transforming it into multiple images by using transformation techniques such as rotation, shear and translation, thus enabling the model to achieve higher accuracy. A convolutional neural network (CNN) is widely used for classification tasks and it is computationally efficient too [10].
In this paper, we evaluate the effectiveness of various existing deep learning models for malaria detection from microscopic blood images, and also propose an efficient DL method for the classification of infected and uninfected malaria cells. The proposed customized CNN-based algorithm outperforms all observed deep learning models. The proposed method uses bilateral filtering for improving image quality and image augmentation techniques for better generalization of the model. Our model has a simple CNN architecture containing 5 convolutional and pooling layers. The performance of the proposed method is evaluated on a benchmark malaria dataset, and the results are compared with other existing, similar techniques. The results show that our method achieves excellent performance and outperforms the compared techniques.
The rest of the paper is organized as follows. The literature on automated malaria detection is reviewed in Section 2. The various deep learning models used for performance evaluation are presented in Section 3. The proposed deep learning method is presented in Section 4. The details of the test dataset and preprocessing of the data, performance evaluation of the proposed method, and comparison of deep learning models are described in Section 5. The research is concluded in Section 6.

Related Work
The traditional pipeline of automating the process of malaria diagnosis follows four steps: image pre-processing, cell segmentation, feature selection, and classification of infected and uninfected malaria cells, as shown in Figure 1. For each step, different methods are proposed in the literature. Image pre-processing methods enhance the quality of blood smear images to improve the accuracy of later processing steps such as cell segmentation, feature extraction, and classification. Any kind of impurities in the images can affect the performance of the later processing steps and are vulnerable to the misclassification of malaria cells. Different smoothing filters, e.g., Gaussian, median, and geometric mean filters, are extensively used to suppress noise in microscopic images [11,12]. Morphological operators have also been used to remove impurities by improving cell contours and suppressing noise by filling holes [13]. Adaptive threshold and histogram equalization is also used to enhance the resolution and contrast of the images [14]. Malaria detection methods e.g., [15] used HSV color space and grayscale color normalization to reduce illumination variation in cells. Low-pass filters have also been exploited to eliminate noise-related frequency components from the microscopic images [16]. Some techniques e.g., [17] used the Laplacian filter to sharpen edges and to enhance the red blood cell (RBC) boundaries in images. The method in [18] used the Wiener filter to remove the blurriness induced in the microscopic images due to unfocussed optics. Cell segmentation is the most significant step in any automated malaria detection system. Pre-processed microscopic images are segmented into small non-overlapping regions containing the red blood cells (RBC), the white blood cells (WBC), the malaria parasites, and other artifacts. Image-based methods such as Chan-Vese segmentation, holefilling algorithms, and histogram-method-based methods are popular for cell segmentation in an unsupervised manner [19]. A green channel of RGB images is used to segment cells in case of low-contrast images in [20]. In [21], the Otsu threshold is used to segment RBCs from enhanced images. In malaria detectors such as [22,23], thresholding techniques are applied in Hue-Saturation-Value (HSV) color space on S and V channels to segment cells from microscopic images. The fuzzy divergence technique is used for cell segmentation in [24]. The fuzzy rule-based segmentation method is applied in [25] to segment malaria cells from images of three different color spaces. In [13], morphological-based approaches use grayscale granulometry to capture regional extrema and segment cells. Hough transform is used in [26] to identify RBC using its shape, and k-means clustering is used to segment cells from unlabeled data in [27]. Marker controlled watershed algorithm is used in [28,29] to separate the overlapping cells that complicate the segmentation process. A graph-cuts based technique for cell segmentation is proposed in [30]. The methods in [31,32] utilize the structure, geometry, and color information of cells to identify WBC and gametocytes. Machine learning approaches and neural networks are used for cell segmentation in [33].
Feature selection for cell images depends on the shape, texture, and color of red blood cells. HSV color space and green channel of RGB color space are preferred for feature extraction because the color features are prominent in stained blood images [31,34]. Histogram of oriented gradients (HOG) features [26], Haralick's texture features [35], local binary patterns [36], and various other feature-selection methods [37] have been used to extract features from cell images. Types of parasite can be identified from cell images by using the color and shape information [38]. Morphological operations like thinning and grayscale granulometry capture relevant information from the intensity of image pixels [39,40]. In [12], the support vector machine (SVM) and Bayesian learning are used for the classification of malaria cells by utilizing the discriminative feature set.
In the malaria detection algorithm presented in [41], linear Euclidean distance classifier with Poisson distribution and Gabor filtering are used to detect malaria cells in blood smear images. An adaptive neuro-fuzzy interface system (ANFIS) is used to diagnose species of malaria infection in [42]. A genetic algorithm based on chromosome-encoding schemes and mutation strategies are also utilized to diagnose malaria-infected cells [43]. K-means clustering is used for the unsupervised detection of malaria-infected cells in [19]. In [44], SVM and artificial neural network (ANN) detect the malaria parasite by using the information of normalized red, green, and blue, and the texture features that are invariant to staining variations.
Deep learning (DL) revolutionized the traditional malaria detection pipeline by skipping the feature selection step that requires expertise to capture variability in the angle, position, shape, texture, color, and size of objects in images. Deep learning became popular for medical image analysis because of its biggest advantage of learning features from underlying data without any designing of feature sets. Deep learning models use non-linear activation units for neurons in layers that discover underlying hierarchical patterns in the data. Features are extracted using end-to-end hidden layers that learn complex decisionmaking functions, leading to classification. A deep learning CNN model is used by Dong et al. [45] for cell segmentation and deep belief networks are used for the classification of malaria-infected and uninfected images. The CNN models [46] can recognize patterns in microscopic images with a much higher accuracy rate than other traditional approaches. Fully convolutional regression networks (FCRN) [47] regress CNN spatial feature maps to detect and count cells in microscopic images. Deep learning approaches used for cell segmentation provide more accurate results as compared to previous techniques.
The convolutional neural networks are used for images by exploiting the information of spatial local correlation between adjacent pixels by using shared weights, local receptive fields, and down-sampling of feature maps. In recent researches, customized CNN with focus stack images showed improvements in classification performance [48]. In [49], a customized CNN model with 16 layers is proposed that outperforms all transfer learning models. The CNN Models AlexNet and VGG-16 are compared in [50] to improve the accuracy of malaria detection. Hung et al. [51] uses two-stage classification with faster RCNN for detecting red blood cells in images and and used AlexNet for the classification of malaria cells. The method in [52] proposed LeNet-5 for automated diagnosis of malaria. The VGG-16 is unified with SVM in [53] to classify falciparum malaria cells. Pre-trained CNN models trained on imagenet weights e.g., VGG, ResNet, etc. are used as feature extractors in [54] and classify malaria-infected and uninfected cells. The malaria detectors presented in [55,56] also use deep learning. An ensemble of three deep learning models, customized CNN, VGG-16, and CNN with SVM, is proposed in [57] for malaria parasite detection.

Compared Deep Learning Models
In visual recognition tasks, the computer processes images by their pixel values. To process a 30 × 30 color image (in RGB), 2700 (30 × 30 × 3) pixel values need to be processed. Neural network is a sequence of layers with neurons that distinguish underlying relationships in a set of data. In fully connected neural networks, if we take an input image with 30 × 30 × 3 pixels, the number of weights in the first hidden layer will be 2700. In real life, we have large images with 250 × 250 dimensions and more. If we take an input image with 250 × 250 × 3 pixels, then the number of weights in the first hidden layer will be 187,500. Therefore, we need to deal with a massive amount of parameters and need more neurons for deep networks that can cause over-fitting.
In a convolutional neural network (CNN), the neuron in a layer will only connect to a small number of neurons in the previous layer, instead of all neurons in a fully connected way, which is why we need to handle fewer weights and fewer neurons. This is the reason that CNN performs better than fully connected neural networks for image classification tasks [58]. The CNN architecture has two important layers: convolutional layers and pooling layers. The convolutional layer performs the convolution operation on images to extract feature maps and the pooling layer reduces the size of the feature maps by performing down-sampling.
Convolutional neural networks with several hidden layers and a large number of parameters are fantastic for image classification tasks [59]. CNN masters spatial patterns from images which are also invariant to translation and learn the different features of images. VGG, ResNet, DenseNet, and Inception [60][61][62] are popular due to their outstanding performance in ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [63]. Designing a network architecture is a tedious process and requires a lot of effort. There are various architectures designed for different problems. For the detection of malaria-infected cells, the performance of various architectures of CNN, e.g., customized CNN model, VGG, ResNet, DenseNet, Inception, Xception, and SqueezeNet are analyzed in this research. Each architecture is briefly introduced in the following sections.

VGG
This is a CNN model designed by Visual Geometry Group (VGG) of Oxford University for large-scale image recognition [64]. The VGG model is one of the prominent models in the ILSVRC-2014 competition [63] and accomplishes 92.7% accuracy on the testing data of ImageNet. It has two flavors: VGG16 and VGG19. The former has 16 layers with 5 max-pooling layers and 5 blocks of convolutional layers, where each block has two or more convolutional layers. The convolutional layers of this model only use 3 × 3 kernel size and the max-pool layer uses 2 × 2 kernel size. The VGG16 beats the previous models in ILSVRC-2012 and ILSVRC-2013 competitions. The VGG19 has 19 layers with 5 maxpooling layers and 5 blocks of convolutional layers. The only difference between VGG19 and VGG16 is the last 3 convolution blocks, which have 4 convolutional layers in VGG19 and 3 convolutional layers in VGG16.

ResNet
Residual Neural Network (ResNet) [60] is a convolutional neural network that can train a large number of layers with a compelling performance. It provides a state-of-the-art solution to the vanishing gradient problem. The training of neural networks relies on the back-propagation process, which needs a gradient descent to minimize the loss function and learns the model weights. For a large number of layers, the gradient becomes smaller, and even vanishes due to repeated multiplication, and performance degradation occurs with each additional layer. ResNet uses an identity skip connection that skips one or more layers and reuses the feature maps from previous layers. Skipping the layers squeezes the network, which allows faster learning. The layers expand during training and the residual portions of the network discover more features from the source image. ResNet has many flavors, such as ResNet-50, ResNet-101 and ResNet-152. ResNet-V2 differs from ResNet-V1 because it uses batch normalization before each weight layer.

DenseNet
In Dense Convolutional Neural Network (DenseNet) [61], every layer is connected with all preceding layers and the feature maps of previous layers are used as input to the new layer, and its feature map is used as input for the succeeding layer. The traditional convolutional networks have a direct connection between layers, and features of previous layers not provided to the next layer. DenseNet achieves good performance with a smaller amount of computation. DenseNet exploits the network potential by reusing the learned features and not learning the redundant features again in subsequent layers [65]. DenseNet concatenates feature maps of previous layers with a new layer, while ResNet adds feature maps of layers. Each layer of DenseNet learns a small set of new feature-maps that require fewer parameters for training. DenseNet improves the vanishing-gradient problem by allowing each layer to directly access the gradients from the loss function. DenseNet has many flavors, such as DenseNet-121, DenseNet-169, and DenseNet-201.

Inception
Inception is a deep neural network designed by Google that plays an important role in the development of convolutional network classifiers [62]. To enhance the performance of the model in terms of accuracy and speed, inception uses a lot of tricks. It has an inception module that performs a convolutional operation on the input with three filters of different sizes (1 × 1, 3 × 3, 5 × 5) and also performs max-pooling. The outputs of these 4 layers (3 convolutional and 1 pooling) are concatenated and fed to the next inception module in the network. To reduce the computational cost of the network, the number of input channels is limited by adding 1 × 1 convolution before performing 3 × 3 and 5 × 5 convolutions, because 1 × 1 convolution operation is cheaper and reduces the number of input channels [66]. The inception has many variants like Inception-v1, Inception-v2, Inception-v3, and Inception Resnet-v2 [67]. The inception model evolves by using smart factorization methods that reduce the convolution operation cost. It also reduces the representational bottleneck in which information is lost by reducing the dimensions of input data. The Inception-v3 model contains 11 inception modules.

Xception
Xception (extreme version of Inception) [68] is a deep neural network designed by Google (Mountain View, CA, USA). Xception model modifies the depth-wise separable convolution of inception model which implements 1 × 1 convolution before channel-wise spatial convolution. The Xception network has 71 layers designed for large-scale image recognition. There is non-linearity in the inception module after the first operation but there is no in-between ReLU or ELU non-linearity in the Xception model. It achieves the excellent accuracy without any intermediate activation. The Xception model also uses residual or identity skip connections to increase accuracy that is the core idea of the ResNet network. The Xception model outperforms VGG-16, ResNet-152, and Inception-v3 models trained on the imagenet dataset [69]. Xception model architecture has the same number of parameters as that of Inception-v3 but, due to effective use of these parameters, the performance of the Xception model is improved [68]. These performance gains are not due to increased capacity but due to the more efficient use of model parameters.

SqueezeNet
SqueezeNet [70] is a small neural network with 18 layers that can fit into a small amount of memory and requires less bandwidth over a computer network. It has 50-times fewer model parameters than AlexNet but it achieves AlexNet-level accuracy. In SqueezeNet, 3 × 3 filters are replaced by 1 × 1 filters, and the number of input channels to 3 × 3 filters is decreased by using squeeze layers to reduce model parameters. Delayed down-sampling in a network attains large activation maps that maximize accuracy with limited model parameters. SqueezeNet has a Fire module in which the squeeze layer performs convolution by using a 1 × 1 filter, and then the output forwards to expand layer that has a mixture of 1 × 1 and 3 × 3 convolutions. SqueezeNet has a 50× reduced model size than AlexNet, which is a much higher reduction than SVD and deep compression. SqueezeNet has many flavors, such as vanilla squeezeNet and SqueezeNet with simple or complex bypass. The attributes of the DL architectures described in this section are summarized in Table 1.  It has the same number of parameters as Inception V3, the performance improves due to efficient use of model parameters.

VGG
SqueezeNet -It requires small memory and less bandwidth over the network. -Model parameters are reduced by using squeeze layers which perform convolution by using a 1 × 1 filter. -Delayed down-sampling in a network maximizes accuracy with limited model parameters.

Proposed Deep Learning Model for Malaria Detection
In this section, we present a novel neural architecture for malaria detection from microscopic thin blood smear images. The proposed method can be divided into three parts, data preprocessing, feature extraction, and classification. Figure 2 shows these steps in a schematic diagram. Before introducing the proposed DL model for efficient malaria detection, we instigate a data pre-processing step that is effective for improving the quality of the images. During data acquisition, the images may get polluted with various types of noise, e.g., camera angle and microscope positions. Different noise removal methods have been proposed in the literature to eliminate such kinds of noise from the images. These methods include simple blur operators such as averaging filter and non-linear filters e.g., median filter. In our case, the RBC scans are of low resolution and contain important information about the parasite presentation that could be degraded or lost if simple blurring methods are used. We need an image denoising method that removes the noise from the image, preserving the structural information present in the image. We found the bilateral filter [71] to be quite effective in this scenario, as shown in [11].
In conventional image blurring methods, in the computation of the new pixel value, each pixel contributes based on its distance from the filter center. The filter weights in the bilateral filter consider both the spatial distance of the pixel from the filter and the pixel color/intensity range differences. The former factor introduces the blur in the image and the latter tries to preserve the structural information in the image.
Let I be an input image of size M × N which is subject to bilateral fileting of size size (2d + 1) × (2d + 1). The value of pixel (x, y) is computed as whereĪ is the filtered image, w is the normalizing factor, and g r is the range kernel computed as The g s contains the distance of pixel (i, j) from the center pixel (x, y). It is calculated as where σ r and σ s are variance parameters. All images are filtered using Equation (1) and are resized to equal dimensions (125 × 125), as the deep learning models require the same input shape for all images of the dataset. Figure 3 shows the results of applying bilateral filtering on sample infected and uninfected cell images. We propose a customized CNN model for efficient malaria detection by the classification of infected and uninfected RBC images. The input images with 125 × 125 × 3 dimensions are fed toa model with 5 convolutional layers, 5 max-pooling layers, and 2 fully-connected layers. The proposed model is shown in Figure 4. The convolutional layers of model use 3 × 3 kernel size, ReLU activation function, and 32, 64, 128, 256, 300 filters for all five layers, respectively. By visualizing the convolutional layers of the customized CNN model in Figure 5, it can be seen that the initial layer extracts low-level features that are more recognizable to humans and the final convolutional layer extracts high-level features that are more model-recognizable.  All max-pooling layers of the model following the convolutional layers use 2 × 2 pool size with the stride of 2 pixels that crisps the outputs of feature maps produced by the convolutional layers. The output of the fourth pooling layer is fed to the fully connected (FC) layers with 0.5 dropouts after each FC layer, and the output of the final FC layer is fed to the sigmoid classifier. The model is trained with hyper-parameters, 64 batch size, 25 epochs, ADAM optimizer, binary cross-entropy loss, and 0.5 dropout ratio for regularization, as many pieces of research recommended this ratio of dropout [72]. The dropout layer is used to reduce the over-fitting of the model and to generalize the results of the model.

Experimental Evaluations and Results
In this section, we evaluate the performance of different deep learning architectures for malaria detection using the NIH Malaria dataset [54]. A time complexity analysis is also presented for these models. Moreover, the performance of the proposed method is also compared with the existing malaria-detection algorithms using different statistical measures.

Data Acquisition
The NIH Malaria dataset [54] used to evaluate the performance of the compared models which is publicly available on the National Institute of Health (NIH) (https: //lhncbc.nlm.nih.gov/LHC-publications/pubs/MalariaDatasets.html, accessed date: 18 February 2021). Researchers developed a smartphone application attached with a traditional light microscope [9] for the screening of infected and uninfected red blood cell images at the Lister Hill National Center for Biomedical Communications (LHNCBC), part of the National Library of Medicine (NLM). Thin blood smear films of 50 healthy and 150 P. falciparum-infected patients were prepared by using the Giemsa staining solution to enhance the visibility of parasites and were photographed at Chittagong Medical College Hospital, Bangladesh. Images of blood films are captured by the built-in smartphone camera for each microscopic field of view. These images are then manually annotated by expert slide readers at the Mahidol-Oxford Tropical Medicine Research Unit, Bangkok, Thailand. The dataset has 27,558 cell images that are equally balanced between 13,779 parasitized and 13,779 uninfected. The cell images have variations in color distributions due to different bloodstains during the process of data acquisition. Figure 6 shows samples of parasitized and uninfected segmented red blood cell images from the malaria dataset.

Data Preprocessing
The NIH malaria dataset is balanced with 13,779 parasitized and 13,779 uninfected cell images. To evaluate the performance of the deep learning models, the dataset is split into training, testing, and validation subsets. To this end, 60% data are used for training and 10% data for validation and the remaining 30% unseen dataset is used for testing the performance of the trained model. Partitioning details of the dataset are presented in Table 2. Table 2. Partitioning of the test dataset into training, testing, and validation datasets before performing data augmentation.

Dataset
Parasitized Uninfected   Training  8604  8766  Testing  4196  4076  Validation  979  952 Data augmentation techniques are usually used to improve the accuracy of deep learning models by providing variety in the dataset [73]. Neural networks require large datasets for better generalization and to avoid over-fitting. The image data generator builds powerful deep learning models by using a small dataset [74]. After resizing images of the dataset, ImageDataGenerator module of Keras library https://keras.io/ (accessed date: 18 February 2021) is used for augmenting malaria cell images of training data by applying image transformation operations: 0.1 zoom value, 25 degree rotation, 0.05 sheer range with horizontal flip, (0.1, 0.1) translation for shifting both width and height. Image augmentation is not applied to testing and validation data because model performance is being evaluated for them. After data augmentation, the size of the training dataset is 173,700.

Implementation Details
All experiments are performed on Google Colaboratory with GPU runtime using 28 GB RAM and 68 GB hard disk. All the models are implemented using the Python 3.6, Keras deep learning library with TensorFlow https://www.tensorflow.org/ (accessed date: 18 February 2021). Architectures of deep learning CNN models were downloaded from the internet and trained with hyper-parameters, such as 32 − 64 batch size, 1 × 10 −4 , 1 × 10 −5 learning rate, 20-30 epochs, RMSProp or ADAM optimizer, binary cross-entropy loss, and 0.3-0.5 dropout ratio for regularization. The models were initialized with random weights and trained on the training data using the tensorflow library.
The cross-validation technique uses an independent dataset to evaluate the performance of the machine learning models on unseen data. To this end, the sample dataset is partitioned into k subsets of data and then training is performed on one subset of the data and validation on the other subset of the data. We evaluated the predictive models through five-fold cross-validation over 5 different test sets, with each fold having 2756 test samples and 24,802 training samples. The training data are randomly partitioned into 5 equal-sized subsets, one subset is used for validation testing and the remaining 4 subsets are used for training. The cross-validation process is then repeated 5 times for the proposed model, with each of the 5 subsets used exactly once as the validation data. The results of the validations were averaged to produce a single score.

Performance Evaluation and Comparison
For binary deep learning models, the confusion matrix is used to describe the prediction results on labeled test data. There are four possible outcomes of such a test: true-positive (TP), false-positive (FP), true-negative (TN), and false-negative (FN). The truepositive represents the infected cells correctly diagnosed as infected and the true-negative denotes the number of uninfected cells correctly diagnosed as uninfected. The uninfected cells incorrectly diagnosed as infected are considered false-positive, and the infected cells incorrectly diagnosed as uninfected are false-negatives. Different parametric and nonparametric statistical measures are used in performance evaluation, including Specificity, Sensitivity, Precision, Accuracy, F 1 score, Matthews correlation coefficient (MCC), and Cohen's kappa (κ) [75][76][77].
Specificity calculates the proportion of actual negatives out of total negative observations that are predicted by an algorithm. It is also known as the true negative rate and is computed as Sensitivity is the ability of an algorithm to correctly predict the true positives out of total positive observations. It is also known as true positive rate or Recall.
Precision shows the extent of correctness of the an algorithm in terms of positive results. It is computed as actual positives out of total positive observations.
The ability of an algorithm to differentiate between healthy and infected subjects flawlessly is called accuracy. In the case of the malaria diagnosis model, if an infected cell image is predicted correctly by the model as infected and vice-versa, then the model has high accuracy.
The F 1 score is the weighted harmonic mean of precision and recall measures. It provides an overall accuracy of the model using both positive and negative predictions. It is considered to be a more reliable performance measure than accuracy and precision metrics that can be delusive when the data are highly unbalanced [78,79]. For balancing the classification model with precision and recall, the F 1 score of the model on test data should be high.
The Matthews correlation coefficient (MCC) evaluates the performance of a binary classification model with a range of −1 to +1. The minimum value −1 indicates a poor classifier while the maximum value +1 indicates a correct classifier. The MCC is regarded as a balanced measure because it considers both positive and negative observations. A recent research [80] showed that MCC is a more informative and realistic measure than other parametric statistical measures. It is computed as: Cohen's kappa (κ) measures the agreement between total accuracy and random accuracy of a model. Kappa value, computed as Equation (10), ranges from 0 to 1. The lowest value 0 means no agreement and the highest value 1 means complete agreement where p o is the accuracy of the model (Equation (7)) and p e is the hypothetical probability of chance agreement, computed as All these performance metrics are computed for each deep learning architecture and the results are summarized in Table 3. The results reveal that the proposed malaria detection algorithm performs better than the compared deep learning models. The proposed method achieves more than 0.96 accuracy and F 1 score, and more than 0.93 score in both MCC and κ. The malaria detector based on the VGG19 architecture also achieves appreciative results with an F 1 and κ score of 0.9592 and 0.9185, respectively.

Performance Comparison with Existing Methods
We also compare the performance of the proposed CNN model with other malaria detectors to evaluate its effectiveness. The compared methods include state-of-the-art and widely accepted automated malaria detection algorithms, including [11,12,[51][52][53][54][55][56]. The malaria detection algorithm proposed by Fatima et al. [11] uses adaptive thresholding technique with a morphological image-processing tools. Specifically, object contours and the eight-connected rule are exploited to confirm the existence of malaria parasites in the cell. The malaria detector presented by Das et al. in [12] uses a marker-controlled watershed approach to segment the erythrocytes. Various features describing the shape, size, and texture characteristics of segmented erythrocytes are computed and classified with a support vector machine. In Hung [51] algorithm, a faster, region-based convolutional neural network (Faster R-CNN) is proposed for malaria parasite detection. The model is pre-trained on ImageNet and is fine-tuned on the malaria dataset. Sanchez [55] also uses a deep convolutional neural network for malaria parasite detection. Pan et al. [52] proposed a deep convolutional neural-network-based algorithm for malaria detection. It uses Otsu's method and morphological operations for RBC segmentation, and LeNet-5 [81]-based CNN architecture is used for cell classification. A transfer learning based approach is presented by Vijayalakshmi et al. [53] for identifying malaria-infected cells in microscopic blood images. The presented model is a unification of VGG and support vector machine. The malaria detection algorithm introduced by Rajaraman et al. [54] consists of a three-layer convolutional neural network. It is a fully connected sequential architecture. A deep belief network-based malaria detection approach is proposed by Bibin et al. [56].
The performance achieved by the proposed and the compared methods is presented Table 4. The datasets used in some of the compared methods were different from the test dataset, and therefore such a comparison might not be just; nonetheless, it can help to assess the effectiveness of the proposed method. The results show that our method performs better than the compared methods in all performance metrics except sensitivity, where the Das [12] algorithm performs better; however, its specificity and accuracy are considerably lower than the proposed method. The performance results of Rajaraman, Fatima, and proposed algorithms are computed over the same dataset. These statistics show that the proposed method performs the best, followed by the Rajaraman method, which achieves a good detection rate.
The results presented in Table 4 indicate that the proposed deep malaria detector is an efficient and reliable tool to test microscopic blood smears for Plasmodium parasite infection. The proposed method exploits the bilateral filtering to suppress the noise in the images and provide better quality images for model training. Moreover, the data augmentation techniques allow variations in the dataset that helps in improving the performance and also generalizes the model to avoid over-fitting. It is also worth noting that the computational cost of the proposed method is reduced by using a 3 × 3 convolution filter for all convolutional layers, which eventually reduces the number of input channels. The deep learning models discussed in Table 4 have at least 16 layers and a large number of training parameters. Contrastingly, the proposed CNN architecture has only 8 layers and requires fewer training parameters compared to other deep learning models.

Time Complexity Analysis
The training time of deep learning models increases with the number of parameters and weights. We observe the training time of the deep learning models used in our study for 25 epochs to evaluate their time complexity; the results are shown in Figure 7a. We recall that 173,700 images of the test dataset have been used in training. The results show that our model takes around 25 min for training, which is less than all the compared models. The reason for this computational efficiency is its small number of convolutional layers with a small number of parameters that eventually require less time for training. The VGG-19 and Inception models exhibit the best training times amongst the rest of the models. We also evaluated the testing time of compared models for complete testing data containing 8272 images, the results are presented in Figure 7b. The proposed model takes 5 s for a complete testing dataset, less than the other compared models. Among the rest of the models, the SqueezeNet model takes 6 s for testing, and it took around 30 min for training.

Conclusions
In this paper, we proposed a deep learning solution for automated malaria detection from microscopic blood smears. The proposed CNN model exploits the bilateral filtering to remove noise from images and uses image augmentation techniques to achieve generalization. We also evaluated different deep learning models for malaria detection problems and compared their performance with the proposed method. The performance of the proposed method is also compared with the existing automated malaria detectors. The experimental evaluations performed on a benchmark dataset show that the proposed method performs better than other deep learning models. The proposed method also outperforms the compared malaria detection algorithms by achieving more than 0.96 accuracy and F 1 score. A time complexity analysis shows that our method is computationally efficient. Institutional Review Board Statement: The research conducted in this paper utilized the publicly available NIH Malaria database therefore, no formal approval is required.